Paper Group ANR 914
Class-Agnostic Counting. Clustering Player Strategies from Variable-Length Game Logs in Dominion. Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets. Voice Conversion with Conditional SampleRNN. Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking. Fi …
Class-Agnostic Counting
Title | Class-Agnostic Counting |
Authors | Erika Lu, Weidi Xie, Andrew Zisserman |
Abstract | Nearly all existing counting methods are designed for a specific object class. Our work, however, aims to create a counting model able to count any class of object. To achieve this goal, we formulate counting as a matching problem, enabling us to exploit the image self-similarity property that naturally exists in object counting problems. We make the following three contributions: first, a Generic Matching Network (GMN) architecture that can potentially count any object in a class-agnostic manner; second, by reformulating the counting problem as one of matching objects, we can take advantage of the abundance of video data labeled for tracking, which contains natural repetitions suitable for training a counting model. Such data enables us to train the GMN. Third, to customize the GMN to different user requirements, an adapter module is used to specialize the model with minimal effort, i.e. using a few labeled examples, and adapting only a small fraction of the trained parameters. This is a form of few-shot learning, which is practical for domains where labels are limited due to requiring expert knowledge (e.g. microbiology). We demonstrate the flexibility of our method on a diverse set of existing counting benchmarks: specifically cells, cars, and human crowds. The model achieves competitive performance on cell and crowd counting datasets, and surpasses the state-of-the-art on the car dataset using only three training images. When training on the entire dataset, the proposed method outperforms all previous methods by a large margin. |
Tasks | Crowd Counting, Few-Shot Learning, Object Counting |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00472v1 |
http://arxiv.org/pdf/1811.00472v1.pdf | |
PWC | https://paperswithcode.com/paper/class-agnostic-counting |
Repo | |
Framework | |
Clustering Player Strategies from Variable-Length Game Logs in Dominion
Title | Clustering Player Strategies from Variable-Length Game Logs in Dominion |
Authors | Henry Bendekgey |
Abstract | We present a method for encoding game logs as numeric features in the card game Dominion. We then run the manifold learning algorithm t-SNE on these encodings to visualize the landscape of player strategies. By quantifying game states as the relative prevalence of cards in a player’s deck, we create visualizations that capture qualitative differences in player strategies. Different ways of deviating from the starting game state appear as different rays in the visualization, giving it an intuitive explanation. This is a promising new direction for understanding player strategies across games that vary in length. |
Tasks | |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11273v2 |
http://arxiv.org/pdf/1811.11273v2.pdf | |
PWC | https://paperswithcode.com/paper/clustering-player-strategies-from-variable |
Repo | |
Framework | |
Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets
Title | Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets |
Authors | Francisco Charte, Antonio J. Rivera, María J. del Jesus, Francisco Herrera |
Abstract | Multilabel classification is an emergent data mining task with a broad range of real world applications. Learning from imbalanced multilabel data is being deeply studied latterly, and several resampling methods have been proposed in the literature. The unequal label distribution in most multilabel datasets, with disparate imbalance levels, could be a handicap while learning new classifiers. In addition, this characteristic challenges many of the existent preprocessing algorithms. Furthermore, the concurrence between imbalanced labels can make harder the learning from certain labels. These are what we call \textit{difficult} labels. In this work, the problem of difficult labels is deeply analyzed, its influence in multilabel classifiers is studied, and a novel way to solve this problem is proposed. Specific metrics to assess this trait in multilabel datasets, called \textit{SCUMBLE} (\textit{Score of ConcUrrence among iMBalanced LabEls}) and \textit{SCUMBLELbl}, are presented along with REMEDIAL (\textit{REsampling MultilabEl datasets by Decoupling highly ImbAlanced Labels}), a new algorithm aimed to relax label concurrence. How to deal with this problem using the R mldr package is also outlined. |
Tasks | |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05033v1 |
http://arxiv.org/pdf/1802.05033v1.pdf | |
PWC | https://paperswithcode.com/paper/dealing-with-difficult-minority-labels-in |
Repo | |
Framework | |
Voice Conversion with Conditional SampleRNN
Title | Voice Conversion with Conditional SampleRNN |
Authors | Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco, Dan Darcy |
Abstract | Here we present a novel approach to conditioning the SampleRNN generative model for voice conversion (VC). Conventional methods for VC modify the perceived speaker identity by converting between source and target acoustic features. Our approach focuses on preserving voice content and depends on the generative network to learn voice style. We first train a multi-speaker SampleRNN model conditioned on linguistic features, pitch contour, and speaker identity using a multi-speaker speech corpus. Voice-converted speech is generated using linguistic features and pitch contour extracted from the source speaker, and the target speaker identity. We demonstrate that our system is capable of many-to-many voice conversion without requiring parallel data, enabling broad applications. Subjective evaluation demonstrates that our approach outperforms conventional VC methods. |
Tasks | Voice Conversion |
Published | 2018-08-24 |
URL | http://arxiv.org/abs/1808.08311v1 |
http://arxiv.org/pdf/1808.08311v1.pdf | |
PWC | https://paperswithcode.com/paper/voice-conversion-with-conditional-samplernn |
Repo | |
Framework | |
Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking
Title | Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking |
Authors | Xiao Wang, Chenglong Li, Rui Yang, Tianzhu Zhang, Jin Tang, Bin Luo |
Abstract | The tracking-by-detection framework requires a set of positive and negative training samples to learn robust tracking models for precise localization of target objects. However, existing tracking models mostly treat different samples independently while ignores the relationship information among them. In this paper, we propose a novel structure-aware deep neural network to overcome such limitations. In particular, we construct a graph to represent the pairwise relationships among training samples, and additionally take the natural language as the supervised information to learn both feature representations and classifiers robustly. To refine the states of the target and re-track the target when it is back to view from heavy occlusion and out of view, we elaborately design a novel subnetwork to learn the target-driven visual attentions from the guidance of both visual and natural language cues. Extensive experiments on five tracking benchmark datasets validated the effectiveness of our proposed method. |
Tasks | Object Tracking |
Published | 2018-11-25 |
URL | http://arxiv.org/abs/1811.10014v2 |
http://arxiv.org/pdf/1811.10014v2.pdf | |
PWC | https://paperswithcode.com/paper/describe-and-attend-to-track-learning-natural |
Repo | |
Framework | |
Financial Risk and Returns Prediction with Modular Networked Learning
Title | Financial Risk and Returns Prediction with Modular Networked Learning |
Authors | Carlos Pedro Gonçalves |
Abstract | An artificial agent for financial risk and returns’ prediction is built with a modular cognitive system comprised of interconnected recurrent neural networks, such that the agent learns to predict the financial returns, and learns to predict the squared deviation around these predicted returns. These two expectations are used to build a volatility-sensitive interval prediction for financial returns, which is evaluated on three major financial indices and shown to be able to predict financial returns with higher than 80% success rate in interval prediction in both training and testing, raising into question the Efficient Market Hypothesis. The agent is introduced as an example of a class of artificial intelligent systems that are equipped with a Modular Networked Learning cognitive system, defined as an integrated networked system of machine learning modules, where each module constitutes a functional unit that is trained for a given specific task that solves a subproblem of a complex main problem expressed as a network of linked subproblems. In the case of neural networks, these systems function as a form of an “artificial brain”, where each module is like a specialized brain region comprised of a neural network with a specific architecture. |
Tasks | |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.05876v1 |
http://arxiv.org/pdf/1806.05876v1.pdf | |
PWC | https://paperswithcode.com/paper/financial-risk-and-returns-prediction-with |
Repo | |
Framework | |
Spectral feature mapping with mimic loss for robust speech recognition
Title | Spectral feature mapping with mimic loss for robust speech recognition |
Authors | Deblin Bagchi, Peter Plantinga, Adam Stiff, Eric Fosler-Lussier |
Abstract | For the task of speech enhancement, local learning objectives are agnostic to phonetic structures helpful for speech recognition. We propose to add a global criterion to ensure de-noised speech is useful for downstream tasks like ASR. We first train a spectral classifier on clean speech to predict senone labels. Then, the spectral classifier is joined with our speech enhancer as a noisy speech recognizer. This model is taught to imitate the output of the spectral classifier alone on clean speech. This \textit{mimic loss} is combined with the traditional local criterion to train the speech enhancer to produce de-noised speech. Feeding the de-noised speech to an off-the-shelf Kaldi training recipe for the CHiME-2 corpus shows significant improvements in WER. |
Tasks | Robust Speech Recognition, Speech Enhancement, Speech Recognition |
Published | 2018-03-26 |
URL | http://arxiv.org/abs/1803.09816v1 |
http://arxiv.org/pdf/1803.09816v1.pdf | |
PWC | https://paperswithcode.com/paper/spectral-feature-mapping-with-mimic-loss-for |
Repo | |
Framework | |
A Multi-task Selected Learning Approach for Solving 3D Flexible Bin Packing Problem
Title | A Multi-task Selected Learning Approach for Solving 3D Flexible Bin Packing Problem |
Authors | Lu Duan, Haoyuan Hu, Yu Qian, Yu Gong, Xiaodong Zhang, Yinghui Xu, Jiangwen Wei |
Abstract | A 3D flexible bin packing problem (3D-FBPP) arises from the process of warehouse packing in e-commerce. An online customer’s order usually contains several items and needs to be packed as a whole before shipping. In particular, 5% of tens of millions of packages are using plastic wrapping as outer packaging every day, which brings pressure on the plastic surface minimization to save traditional logistics costs. Because of the huge practical significance, we focus on the issue of packing cuboid-shaped items orthogonally into a least-surface-area bin. The existing heuristic methods for classic 3D bin packing don’t work well for this particular NP-hard problem and designing a good problem-specific heuristic is non-trivial. In this paper, rather than designing heuristics, we propose a novel multi-task framework based on Selected Learning to learn a heuristic-like policy that generates the sequence and orientations of items to be packed simultaneously. Through comprehensive experiments on a large scale real-world transaction order dataset and online AB tests, we show: 1) our selected learning method trades off the imbalance and correlation among the tasks and significantly outperforms the single task Pointer Network and the multi-task network without selected learning; 2) our method obtains an average 5.47% cost reduction than the well-designed greedy algorithm which is previously used in our online production system. |
Tasks | Combinatorial Optimization |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06896v3 |
http://arxiv.org/pdf/1804.06896v3.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-task-selected-learning-approach-for |
Repo | |
Framework | |
Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling
Title | Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling |
Authors | Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung |
Abstract | Building large-scale datasets for training code-switching language models is challenging and very expensive. To alleviate this problem using parallel corpus has been a major workaround. However, existing solutions use linguistic constraints which may not capture the real data distribution. In this work, we propose a novel method for learning how to generate code-switching sentences from parallel corpora. Our model uses a Seq2Seq model in combination with pointer networks to align and choose words from the monolingual sentences and form a grammatical code-switching sentence. In our experiment, we show that by training a language model using the augmented sentences we improve the perplexity score by 10% compared to the LSTM baseline. |
Tasks | Data Augmentation, Language Modelling |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10254v2 |
http://arxiv.org/pdf/1810.10254v2.pdf | |
PWC | https://paperswithcode.com/paper/learn-to-code-switch-data-augmentation-using |
Repo | |
Framework | |
Multi-set Canonical Correlation Analysis simply explained
Title | Multi-set Canonical Correlation Analysis simply explained |
Authors | Lucas C Parra |
Abstract | There are a multitude of methods to perform multi-set correlated component analysis (MCCA), including some that require iterative solutions. The methods differ on the criterion they optimize and the constraints placed on the solutions. This note focuses perhaps on the simplest version, which can be solved in a single step as the eigenvectors of matrix ${\bf D}^{-1} {\bf R}$. Here ${\bf R}$ is the covariance matrix of the concatenated data, and ${\bf D}$ is its block-diagonal. This note shows that this solution maximizes inter-set correlation (ISC) without further constraints. It also relates the solution to a two step procedure, which first whitens each dataset using PCA, and then performs an additional PCA on the concatenated and whitened data. Both these solutions are known, although a clear derivation and simple implementation are hard to find. This short note aims to remedy this. |
Tasks | |
Published | 2018-02-11 |
URL | http://arxiv.org/abs/1802.03759v1 |
http://arxiv.org/pdf/1802.03759v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-set-canonical-correlation-analysis |
Repo | |
Framework | |
Bounding Box Embedding for Single Shot Person Instance Segmentation
Title | Bounding Box Embedding for Single Shot Person Instance Segmentation |
Authors | Jacob Richeimer, Jonathan Mitchell |
Abstract | We present a bottom-up approach for the task of object instance segmentation using a single-shot model. The proposed model employs a fully convolutional network which is trained to predict class-wise segmentation masks as well as the bounding boxes of the object instances to which each pixel belongs. This allows us to group object pixels into individual instances. Our network architecture is based on the DeepLabv3+ model, and requires only minimal extra computation to achieve pixel-wise instance assignments. We apply our method to the task of person instance segmentation, a common task relevant to many applications. We train our model with COCO data and report competitive results for the person class in the COCO instance segmentation task. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07674v1 |
http://arxiv.org/pdf/1807.07674v1.pdf | |
PWC | https://paperswithcode.com/paper/bounding-box-embedding-for-single-shot-person |
Repo | |
Framework | |
Manipulating and Measuring Model Interpretability
Title | Manipulating and Measuring Model Interpretability |
Authors | Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Vaughan, Hanna Wallach |
Abstract | With the increased use of machine learning in decision-making scenarios, there has been a growing interest in creating human-interpretable machine learning models. While many such models have been proposed, there have been relatively few experimental studies of whether these models achieve their intended effects, such as encouraging people to follow the model’s predictions when the model is correct and to deviate when it makes a mistake. We present a series of randomized, pre-registered experiments comprising 3,800 participants in which people were shown functionally identical models that varied only in two factors thought to influence interpretability: the number of input features and the model transparency (clear or black-box). Predictably, participants who were shown a clear model with a small number of features were better able to simulate the model’s predictions. However, contrary to what one might expect when manipulating interpretability, we found no improvements in the degree to which participants followed the model’s predictions when it was beneficial to do so. Even more surprisingly, increased transparency hampered people’s ability to detect when the model makes a sizable mistake and correct for it, seemingly due to information overload. These counterintuitive results suggest that decision scientists creating interpretable models should harbor a healthy skepticism of their intuitions and empirically verify that interpretable models achieve their intended effects. |
Tasks | Decision Making, Interpretable Machine Learning |
Published | 2018-02-21 |
URL | https://arxiv.org/abs/1802.07810v3 |
https://arxiv.org/pdf/1802.07810v3.pdf | |
PWC | https://paperswithcode.com/paper/manipulating-and-measuring-model |
Repo | |
Framework | |
Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems
Title | Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems |
Authors | Ting-Rui Chiang, Yun-Nung Chen |
Abstract | Solving math word problems is a challenging task that requires accurate natural language understanding to bridge natural language texts and math expressions. Motivated by the intuition about how human generates the equations given the problem texts, this paper presents a neural approach to automatically solve math word problems by operating symbols according to their semantic meanings in texts. This paper views the process of generating equation as a bridge between the semantic world and the symbolic world, where the proposed neural math solver is based on an encoder-decoder framework. In the proposed model, the encoder is designed to understand the semantics of problems, and the decoder focuses on tracking semantic meanings of the generated symbols and then deciding which symbol to generate next. The preliminary experiments are conducted in a dataset Math23K, and our model significantly outperforms both the state-of-the-art single model and the best non-retrieval-based model over about 10% accuracy, demonstrating the effectiveness of bridging the symbolic and semantic worlds from math word problems. |
Tasks | Math Word Problem Solving |
Published | 2018-11-02 |
URL | https://arxiv.org/abs/1811.00720v2 |
https://arxiv.org/pdf/1811.00720v2.pdf | |
PWC | https://paperswithcode.com/paper/semantically-aligned-equation-generation-for |
Repo | |
Framework | |
Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks
Title | Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks |
Authors | Yang He, Xuanyi Dong, Guoliang Kang, Yanwei Fu, Chenggang Yan, Yi Yang |
Abstract | Deeper and wider Convolutional Neural Networks (CNNs) achieve superior performance but bring expensive computation cost. Accelerating such over-parameterized neural network has received increased attention. A typical pruning algorithm is a three-stage pipeline, i.e., training, pruning, and retraining. Prevailing approaches fix the pruned filters to zero during retraining, and thus significantly reduce the optimization space. Besides, they directly prune a large number of filters at first, which would cause unrecoverable information loss. To solve these problems, we propose an Asymptotic Soft Filter Pruning (ASFP) method to accelerate the inference procedure of the deep neural networks. First, we update the pruned filters during the retraining stage. As a result, the optimization space of the pruned model would not be reduced but be the same as that of the original model. In this way, the model has enough capacity to learn from the training data. Second, we prune the network asymptotically. We prune few filters at first and asymptotically prune more filters during the training procedure. With asymptotic pruning, the information of the training set would be gradually concentrated in the remaining filters, so the subsequent training and pruning process would be stable. Experiments show the effectiveness of our ASFP on image classification benchmarks. Notably, on ILSVRC-2012, our ASFP reduces more than 40% FLOPs on ResNet-50 with only 0.14% top-5 accuracy degradation, which is higher than the soft filter pruning (SFP) by 8%. |
Tasks | Image Classification |
Published | 2018-08-22 |
URL | https://arxiv.org/abs/1808.07471v4 |
https://arxiv.org/pdf/1808.07471v4.pdf | |
PWC | https://paperswithcode.com/paper/progressive-deep-neural-networks-acceleration |
Repo | |
Framework | |
Automating Generation of Low Precision Deep Learning Operators
Title | Automating Generation of Low Precision Deep Learning Operators |
Authors | Meghan Cowan, Thierry Moreau, Tianqi Chen, Luis Ceze |
Abstract | State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interest is deploying highly quantized models that operate on low precision inputs and weights less than eight bits, trading off accuracy for performance. These models have a significantly reduced memory footprint (up to 32x reduction) and can replace multiply-accumulates with bitwise operations during compute intensive convolution and fully connected layers. Most deep learning frameworks rely on highly engineered linear algebra libraries such as ATLAS or Intel’s MKL to implement efficient deep learning operators. To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries. In this paper we introduce a work flow to quickly generate high performance low precision deep learning operators for arbitrary precision that target multiple CPU architectures and include optimizations such as memory tiling and vectorization. We present an extensive case study on low power ARM Cortex-A53 CPU, and show how we can generate 1-bit, 2-bit convolutions with speedups up to 16x over an optimized 16-bit integer baseline and 2.3x better than handwritten implementations. |
Tasks | |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.11066v1 |
http://arxiv.org/pdf/1810.11066v1.pdf | |
PWC | https://paperswithcode.com/paper/automating-generation-of-low-precision-deep |
Repo | |
Framework | |