January 31, 2020

3307 words 16 mins read

Paper Group ANR 12

Paper Group ANR 12

A Systematic Study of Leveraging Subword Information for Learning Word Representations. Infer Implicit Contexts in Real-time Online-to-Offline Recommendation. Deep Learning for Audio Signal Processing. Block Argumentation. Policy Continuation with Hindsight Inverse Dynamics. Modeling competitive evolution of multiple languages. Locality Aware Appea …

A Systematic Study of Leveraging Subword Information for Learning Word Representations

Title A Systematic Study of Leveraging Subword Information for Learning Word Representations
Authors Yi Zhu, Ivan Vulić, Anna Korhonen
Abstract The use of subword-level information (e.g., characters, character n-grams, morphemes) has become ubiquitous in modern word representation learning. Its importance is attested especially for morphologically rich languages which generate a large number of rare words. Despite a steadily increasing interest in such subword-informed word representations, their systematic comparative analysis across typologically diverse languages and different tasks is still missing. In this work, we deliver such a study focusing on the variation of two crucial components required for subword-level integration into word representation models: 1) segmentation of words into subword units, and 2) subword composition functions to obtain final word representations. We propose a general framework for learning subword-informed word representations that allows for easy experimentation with different segmentation and composition components, also including more advanced techniques based on position embeddings and self-attention. Using the unified framework, we run experiments over a large number of subword-informed word representation configurations (60 in total) on 3 tasks (general and rare word similarity, dependency parsing, fine-grained entity typing) for 5 languages representing 3 language types. Our main results clearly indicate that there is no “one-sizefits-all” configuration, as performance is both language- and task-dependent. We also show that configurations based on unsupervised segmentation (e.g., BPE, Morfessor) are sometimes comparable to or even outperform the ones based on supervised word segmentation.
Tasks Dependency Parsing, Entity Typing, Representation Learning
Published 2019-04-16
URL https://arxiv.org/abs/1904.07994v2
PDF https://arxiv.org/pdf/1904.07994v2.pdf
PWC https://paperswithcode.com/paper/a-systematic-study-of-leveraging-subword
Repo
Framework

Infer Implicit Contexts in Real-time Online-to-Offline Recommendation

Title Infer Implicit Contexts in Real-time Online-to-Offline Recommendation
Authors Xichen Ding, Jie Tang, Tracy Liu, Cheng Xu, Yaping Zhang, Feng Shi, Qixia Jiang, Dan Shen
Abstract Understanding users’ context is essential for successful recommendations, especially for Online-to-Offline (O2O) recommendation, such as Yelp, Groupon, and Koubei. Different from traditional recommendation where individual preference is mostly static, O2O recommendation should be dynamic to capture variation of users’ purposes across time and location. However, precisely inferring users’ real-time contexts information, especially those implicit ones, is extremely difficult, and it is a central challenge for O2O recommendation. In this paper, we propose a new approach, called Mixture Attentional Constrained Denoise AutoEncoder (MACDAE), to infer implicit contexts and consequently, to improve the quality of real-time O2O recommendation. In MACDAE, we first leverage the interaction among users, items, and explicit contexts to infer users’ implicit contexts, then combine the learned implicit-context representation into an end-to-end model to make the recommendation. MACDAE works quite well in the real system. We conducted both offline and online evaluations of the proposed approach. Experiments on several real-world datasets (Yelp, Dianping, and Koubei) show our approach could achieve significant improvements over state-of-the-arts. Furthermore, online A/B test suggests a 2.9% increase for click-through rate and 5.6% improvement for conversion rate in real-world traffic. Our model has been deployed in the product of “Guess You Like” recommendation in Koubei.
Tasks
Published 2019-07-08
URL https://arxiv.org/abs/1907.04924v1
PDF https://arxiv.org/pdf/1907.04924v1.pdf
PWC https://paperswithcode.com/paper/infer-implicit-contexts-in-real-time-online
Repo
Framework

Deep Learning for Audio Signal Processing

Title Deep Learning for Audio Signal Processing
Authors Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-yiin Chang, Tara Sainath
Abstract Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.
Tasks Information Retrieval, Music Information Retrieval, Speech Recognition
Published 2019-04-30
URL https://arxiv.org/abs/1905.00078v2
PDF https://arxiv.org/pdf/1905.00078v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-audio-signal-processing
Repo
Framework

Block Argumentation

Title Block Argumentation
Authors Ryuta Arisaka, Stefano Bistarelli, Francesco Santini
Abstract We contemplate a higher-level bipolar abstract argumentation for non-elementary arguments such as: X argues against Ys sincerity with the fact that Y has presented his argument to draw a conclusion C, by omitting other facts which would not have validated C. Argumentation involving such arguments requires us to potentially consider an argument as a coherent block of argumentation, i.e. an argument may itself be an argumentation. In this work, we formulate block argumentation as a specific instance of Dung-style bipolar abstract argumentation with the dual nature of arguments. We consider internal consistency of an argument(ation) under a set of constraints, of graphical (syntactic) and of semantic nature, and formulate acceptability semantics in relation to them. We discover that classical acceptability semantics do not in general hold good with the constraints. In particular, acceptability of unattacked arguments is not always warranted. Further, there may not be a unique minimal member in complete semantics, thus sceptic (grounded) semantics may not be its subset. To retain set-theoretically minimal semantics as a subset of complete semantics, we define semi-grounded semantics. Through comparisons, we show how the concept of block argumentation may further generalise structured argumentation.
Tasks Abstract Argumentation
Published 2019-01-18
URL http://arxiv.org/abs/1901.06378v1
PDF http://arxiv.org/pdf/1901.06378v1.pdf
PWC https://paperswithcode.com/paper/block-argumentation
Repo
Framework

Policy Continuation with Hindsight Inverse Dynamics

Title Policy Continuation with Hindsight Inverse Dynamics
Authors Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou
Abstract Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms. On two multi-goal tasks GridWorld and FetchReach, PCHID significantly improves the sample efficiency as well as the final performance.
Tasks
Published 2019-10-30
URL https://arxiv.org/abs/1910.14055v2
PDF https://arxiv.org/pdf/1910.14055v2.pdf
PWC https://paperswithcode.com/paper/policy-continuation-with-hindsight-inverse
Repo
Framework

Modeling competitive evolution of multiple languages

Title Modeling competitive evolution of multiple languages
Authors Zejie Zhou, Boleslaw K. Szymanski, Jianxi Gao
Abstract Increasing evidence demonstrates that in many places language coexistence has become ubiquitous and essential for supporting language and cultural diversity and associated with its financial and economic benefits. The competitive evolution among multiple languages determines the evolution outcome, either coexistence, decline, or extinction. Here, we extend the Abrams-Strogatz model of language competition to multiple languages and then validate it by analyzing the behavioral transitions of language usage over the recent several decades in Singapore and Hong Kong. In each case, we estimate from data the model parameters that measure each language utility for its speakers and the strength of two biases, the majority preference for their language, and the minority aversion to it. The values of these two biases decide which language is the fastest growing in the competition and what would be the stable state of the system. We also study the system convergence time to stable states and discover the existence of tipping points with multiple attractors. Moreover, the critical slowdown of convergence to the stable fractions of language users appears near and peaks at the tipping points, signaling when the system approaches them. Our analysis furthers our understanding of multiple language evolution and the role of tipping points in behavioral transitions. These insights may help to protect languages from extinction and retain the language and cultural diversity.
Tasks
Published 2019-07-16
URL https://arxiv.org/abs/1907.06848v1
PDF https://arxiv.org/pdf/1907.06848v1.pdf
PWC https://paperswithcode.com/paper/modeling-competitive-evolution-of-multiple
Repo
Framework

Locality Aware Appearance Metric for Multi-Target Multi-Camera Tracking

Title Locality Aware Appearance Metric for Multi-Target Multi-Camera Tracking
Authors Yunzhong Hou, Liang Zheng, Zhongdao Wang, Shengjin Wang
Abstract Multi-target multi-camera tracking (MTMCT) systems track targets across cameras. Due to the continuity of target trajectories, tracking systems usually restrict their data association within a local neighborhood. In single camera tracking, local neighborhood refers to consecutive frames; in multi-camera tracking, it refers to neighboring cameras that the target may appear successively. For similarity estimation, tracking systems often adopt appearance features learned from the re-identification (re-ID) perspective. Different from tracking, re-ID usually does not have access to the trajectory cues that can limit the search space to a local neighborhood. Due to its global matching property, the re-ID perspective requires to learn global appearance features. We argue that the mismatch between the local matching procedure in tracking and the global nature of re-ID appearance features may compromise MTMCT performance. To fit the local matching procedure in MTMCT, in this work, we introduce locality aware appearance metric (LAAM). Specifically, we design an intra-camera metric for single camera tracking, and an inter-camera metric for multi-camera tracking. Both metrics are trained with data pairs sampled from their corresponding local neighborhoods, as opposed to global sampling in the re-ID perspective. We show that the locally learned metrics can be successfully applied on top of several globally learned re-ID features. With the proposed method, we report new state-of-the-art performance on the DukeMTMC dataset, and a substantial improvement on the CityFlow dataset.
Tasks
Published 2019-11-27
URL https://arxiv.org/abs/1911.12037v1
PDF https://arxiv.org/pdf/1911.12037v1.pdf
PWC https://paperswithcode.com/paper/locality-aware-appearance-metric-for-multi
Repo
Framework

Learning Optimal Decision Trees from Large Datasets

Title Learning Optimal Decision Trees from Large Datasets
Authors Florent Avellaneda
Abstract Inferring a decision tree from a given dataset is one of the classic problems in machine learning. This problem consists of buildings, from a labelled dataset, a tree such that each node corresponds to a class and a path between the tree root and a leaf corresponds to a conjunction of features to be satisfied in this class. Following the principle of parsimony, we want to infer a minimal tree consistent with the dataset. Unfortunately, inferring an optimal decision tree is known to be NP-complete for several definitions of optimality. Hence, the majority of existing approaches relies on heuristics, and as for the few exact inference approaches, they do not work on large data sets. In this paper, we propose a novel approach for inferring a decision tree of a minimum depth based on the incremental generation of Boolean formula. The experimental results indicate that it scales sufficiently well and the time it takes to run grows slowly with the size of dataset.
Tasks
Published 2019-04-12
URL http://arxiv.org/abs/1904.06314v1
PDF http://arxiv.org/pdf/1904.06314v1.pdf
PWC https://paperswithcode.com/paper/learning-optimal-decision-trees-from-large
Repo
Framework

Multimodal music information processing and retrieval: survey and future challenges

Title Multimodal music information processing and retrieval: survey and future challenges
Authors Federico Simonetta, Stavros Ntalampiras, Federico Avanzini
Abstract Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years.
Tasks Information Retrieval, Music Information Retrieval
Published 2019-02-14
URL http://arxiv.org/abs/1902.05347v1
PDF http://arxiv.org/pdf/1902.05347v1.pdf
PWC https://paperswithcode.com/paper/multimodal-music-information-processing-and
Repo
Framework

Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection

Title Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection
Authors Loreto Parisi, Simone Francia, Silvio Olivastri, Maria Stella Tavella
Abstract One of the key points in music recommendation is authoring engaging playlists according to sentiment and emotions. While previous works were mostly based on audio for music discovery and playlists generation, we take advantage of our synchronized lyrics dataset to combine text representations and music features in a novel way; we therefore introduce the Synchronized Lyrics Emotion Dataset. Unlike other approaches that randomly exploited the audio samples and the whole text, our data is split according to the temporal information provided by the synchronization between lyrics and audio. This work shows a comparison between text-based and audio-based deep learning classification models using different techniques from Natural Language Processing and Music Information Retrieval domains. From the experiments on audio we conclude that using vocals only, instead of the whole audio data improves the overall performances of the audio classifier. In the lyrics experiments we exploit the state-of-the-art word representations applied to the main Deep Learning architectures available in literature. In our benchmarks the results show how the Bilinear LSTM classifier with Attention based on fastText word embedding performs better than the CNN applied on audio.
Tasks Information Retrieval, Music Information Retrieval
Published 2019-01-15
URL http://arxiv.org/abs/1901.04831v1
PDF http://arxiv.org/pdf/1901.04831v1.pdf
PWC https://paperswithcode.com/paper/exploiting-synchronized-lyrics-and-vocal
Repo
Framework

Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media

Title Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media
Authors Muhammad Abdul-Mageed, Chiyu Zhang, Arun Rajendran, AbdelRahim Elmadany, Michael Przystupa, Lyle Ungar
Abstract Social media currently provide a window on our lives, making it possible to learn how people from different places, with different backgrounds, ages, and genders use language. In this work we exploit a newly-created Arabic dataset with ground truth age and gender labels to learn these attributes both individually and in a multi-task setting at the sentence level. Our models are based on variations of deep bidirectional neural networks. More specifically, we build models with gated recurrent units and bidirectional encoder representations from transformers (BERT). We show the utility of multi-task learning (MTL) on the two tasks and identify task-specific attention as a superior choice in this context. We also find that a single-task BERT model outperform our best MTL models on the two tasks. We report tweet-level accuracy of 51.43% for the age task (three-way) and 65.30% on the gender task (binary), both of which outperforms our baselines with a large margin. Our models are language-agnostic, and so can be applied to other languages.
Tasks Multi-Task Learning
Published 2019-11-02
URL https://arxiv.org/abs/1911.00637v1
PDF https://arxiv.org/pdf/1911.00637v1.pdf
PWC https://paperswithcode.com/paper/sentence-level-bert-and-multi-task-learning
Repo
Framework

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

Title iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images
Authors Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai
Abstract Existing Earth Vision datasets are either suitable for semantic segmentation or object detection. In this work, we introduce the first benchmark dataset for instance segmentation in aerial imagery that combines instance-level object detection and pixel-level segmentation tasks. In comparison to instance segmentation in natural scenes, aerial images present unique challenges e.g., a huge number of instances per image, large object-scale variations and abundant tiny objects. Our large-scale and densely annotated Instance Segmentation in Aerial Images Dataset (iSAID) comes with 655,451 object instances for 15 categories across 2,806 high-resolution images. Such precise per-pixel annotations for each instance ensure accurate localization that is essential for detailed scene analysis. Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances. We benchmark our dataset using two popular instance segmentation approaches for natural images, namely Mask R-CNN and PANet. In our experiments we show that direct application of off-the-shelf Mask R-CNN and PANet on aerial images provide suboptimal instance segmentation results, thus requiring specialized solutions from the research community. The dataset is publicly available at: https://captain-whu.github.io/iSAID/index.html
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2019-05-30
URL https://arxiv.org/abs/1905.12886v2
PDF https://arxiv.org/pdf/1905.12886v2.pdf
PWC https://paperswithcode.com/paper/isaid-a-large-scale-dataset-for-instance
Repo
Framework

An amplified-target loss approach for photoreceptor layer segmentation in pathological OCT scans

Title An amplified-target loss approach for photoreceptor layer segmentation in pathological OCT scans
Authors José Ignacio Orlando, Anna Breger, Hrvoje Bogunović, Sophie Riedl, Bianca S. Gerendas, Martin Ehler, Ursula Schmidt-Erfurth
Abstract Segmenting anatomical structures such as the photoreceptor layer in retinal optical coherence tomography (OCT) scans is challenging in pathological scenarios. Supervised deep learning models trained with standard loss functions are usually able to characterize only the most common disease appeareance from a training set, resulting in suboptimal performance and poor generalization when dealing with unseen lesions. In this paper we propose to overcome this limitation by means of an augmented target loss function framework. We introduce a novel amplified-target loss that explicitly penalizes errors within the central area of the input images, based on the observation that most of the challenging disease appeareance is usually located in this area. We experimentally validated our approach using a data set with OCT scans of patients with macular diseases. We observe increased performance compared to the models that use only the standard losses. Our proposed loss function strongly supports the segmentation model to better distinguish photoreceptors in highly pathological scenarios.
Tasks
Published 2019-08-02
URL https://arxiv.org/abs/1908.00764v2
PDF https://arxiv.org/pdf/1908.00764v2.pdf
PWC https://paperswithcode.com/paper/an-amplified-target-loss-approach-for
Repo
Framework

Investigating Biases in Textual Entailment Datasets

Title Investigating Biases in Textual Entailment Datasets
Authors Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville
Abstract The ability to understand logical relationships between sentences is an important task in language understanding. To aid in progress for this task, researchers have collected datasets for machine learning and evaluation of current systems. However, like in the crowdsourced Visual Question Answering (VQA) task, some biases in the data inevitably occur. In our experiments, we find that performing classification on just the hypotheses on the SNLI dataset yields an accuracy of 64%. We analyze the bias extent in the SNLI and the MultiNLI dataset, discuss its implication, and propose a simple method to reduce the biases in the datasets.
Tasks Natural Language Inference, Question Answering, Visual Question Answering
Published 2019-06-23
URL https://arxiv.org/abs/1906.09635v1
PDF https://arxiv.org/pdf/1906.09635v1.pdf
PWC https://paperswithcode.com/paper/investigating-biases-in-textual-entailment
Repo
Framework

High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

Title High probability generalization bounds for uniformly stable algorithms with nearly optimal rate
Authors Vitaly Feldman, Jan Vondrak
Abstract Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional $\sqrt{n}$ factor in the bound. Specifically, their bound on the estimation error of any $\gamma$-uniformly stable learning algorithm on $n$ samples and range in $[0,1]$ is $O(\gamma \sqrt{n \log(1/\delta)} + \sqrt{\log(1/\delta)/n})$ with probability $\geq 1-\delta$. The $\sqrt{n}$ overhead makes the bound vacuous in the common settings where $\gamma \geq 1/\sqrt{n}$. A stronger bound was recently proved by the authors (Feldman and Vondrak, 2018) that reduces the overhead to at most $O(n^{1/4})$. Still, both of these results give optimal generalization bounds only when $\gamma = O(1/n)$. We prove a nearly tight bound of $O(\gamma \log(n)\log(n/\delta) + \sqrt{\log(1/\delta)/n})$ on the estimation error of any $\gamma$-uniformly stable algorithm. It implies that for algorithms that are uniformly stable with $\gamma = O(1/\sqrt{n})$, estimation error is essentially the same as the sampling error. Our result leads to the first high-probability generalization bounds for multi-pass stochastic gradient descent and regularized ERM for stochastic convex problems with nearly optimal rate — resolving open problems in prior work. Our proof technique is new and we introduce several analysis tools that might find additional applications.
Tasks
Published 2019-02-27
URL https://arxiv.org/abs/1902.10710v2
PDF https://arxiv.org/pdf/1902.10710v2.pdf
PWC https://paperswithcode.com/paper/high-probability-generalization-bounds-for
Repo
Framework
comments powered by Disqus