Paper Group ANR 1042
Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning. An adaptive stochastic optimization algorithm for resource allocation. Safe Sample Screening for Robust Support Vector Machine. The Bach Doodle: Approachable music composition with machine learning at scale. Semantics of higher-order probabilistic programs with conditioni …
Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning
Title | Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning |
Authors | Huanhou Xiao, Jinglun Shi |
Abstract | Automatically describing video content with natural language has been attracting much attention in CV and NLP communities. Most existing methods predict one word at a time, and by feeding the last generated word back as input at the next time, while the other generated words are not fully exploited. Furthermore, traditional methods optimize the model using all the training samples in each epoch without considering their learning situations, which leads to a lot of unnecessary training and can not target the difficult samples. To address these issues, we propose a text-based dynamic attention model named TDAM, which imposes a dynamic attention mechanism on all the generated words with the motivation to improve the context semantic information and enhance the overall control of the whole sentence. Moreover, the text-based dynamic attention mechanism and the visual attention mechanism are linked together to focus on the important words. They can benefit from each other during training. Accordingly, the model is trained through two steps: “starting from scratch” and “checking for gaps”. The former uses all the samples to optimize the model, while the latter only trains for samples with poor control. Experimental results on the popular datasets MSVD and MSR-VTT demonstrate that our non-ensemble model outperforms the state-of-the-art video captioning benchmarks. |
Tasks | Video Captioning |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01857v1 |
https://arxiv.org/pdf/1911.01857v1.pdf | |
PWC | https://paperswithcode.com/paper/video-captioning-with-text-based-dynamic |
Repo | |
Framework | |
An adaptive stochastic optimization algorithm for resource allocation
Title | An adaptive stochastic optimization algorithm for resource allocation |
Authors | Xavier Fontaine, Shie Mannor, Vianney Perchet |
Abstract | We consider the classical problem of sequential resource allocation where a decision maker must repeatedly divide a budget between several resources, each with diminishing returns. This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret. We construct an algorithm that is {\em adaptive} to the complexity of the problem, expressed in term of the regularity of the returns of the resources, measured by the exponent in the {\L}ojasiewicz inequality (or by their universal concavity parameter). Our parameter-independent algorithm recovers the optimal rates for strongly-concave functions and the classical fast rates of multi-armed bandit (for linear reward functions). Moreover, the algorithm improves existing results on stochastic optimization in this regret minimization setting for intermediate cases. |
Tasks | Stochastic Optimization |
Published | 2019-02-12 |
URL | https://arxiv.org/abs/1902.04376v3 |
https://arxiv.org/pdf/1902.04376v3.pdf | |
PWC | https://paperswithcode.com/paper/a-problem-adaptive-algorithm-for-resource |
Repo | |
Framework | |
Safe Sample Screening for Robust Support Vector Machine
Title | Safe Sample Screening for Robust Support Vector Machine |
Authors | Zhou Zhai, Bin Gu, Xiang Li, Heng Huang |
Abstract | Robust support vector machine (RSVM) has been shown to perform remarkably well to improve the generalization performance of support vector machine under the noisy environment. Unfortunately, in order to handle the non-convexity induced by ramp loss in RSVM, existing RSVM solvers often adopt the DC programming framework which is computationally inefficient for running multiple outer loops. This hinders the application of RSVM to large-scale problems. Safe sample screening that allows for the exclusion of training samples prior to or early in the training process is an effective method to greatly reduce computational time. However, existing safe sample screening algorithms are limited to convex optimization problems while RSVM is a non-convex problem. To address this challenge, in this paper, we propose two safe sample screening rules for RSVM based on the framework of concave-convex procedure (CCCP). Specifically, we provide screening rule for the inner solver of CCCP and another rule for propagating screened samples between two successive solvers of CCCP. To the best of our knowledge, this is the first work of safe sample screening to a non-convex optimization problem. More importantly, we provide the security guarantee to our sample screening rules to RSVM. Experimental results on a variety of benchmark datasets verify that our safe sample screening rules can significantly reduce the computational time. |
Tasks | |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.11217v1 |
https://arxiv.org/pdf/1912.11217v1.pdf | |
PWC | https://paperswithcode.com/paper/safe-sample-screening-for-robust-support |
Repo | |
Framework | |
The Bach Doodle: Approachable music composition with machine learning at scale
Title | The Bach Doodle: Approachable music composition with machine learning at scale |
Authors | Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu, James Wexler, Leon Hong, Jacob Howcroft |
Abstract | To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented Coconet in TensorFlow.js (Smilkov et al., 2019) to run in the browser and reduced its runtime from 40s to 2s by adopting dilated depth-wise separable convolutions and fusing operations. We also reduced the model download size to approximately 400KB through post-training weight quantization. We calibrated a speed test based on partial model evaluation time to determine if the harmonization request should be performed locally or sent to remote TPU servers. In three days, people spent 350 years worth of time playing with the Bach Doodle, and Coconet received more than 55 million queries. Users could choose to rate their compositions and contribute them to a public dataset, which we are releasing with this paper. We hope that the community finds this dataset useful for applications ranging from ethnomusicological studies, to music education, to improving machine learning models. |
Tasks | Quantization |
Published | 2019-07-14 |
URL | https://arxiv.org/abs/1907.06637v1 |
https://arxiv.org/pdf/1907.06637v1.pdf | |
PWC | https://paperswithcode.com/paper/the-bach-doodle-approachable-music |
Repo | |
Framework | |
Semantics of higher-order probabilistic programs with conditioning
Title | Semantics of higher-order probabilistic programs with conditioning |
Authors | Fredrik Dahlqvist, Dexter Kozen |
Abstract | We present a denotational semantics for higher-order probabilistic programs in terms of linear operators between Banach spaces. Our semantics is rooted in the classical theory of Banach spaces and their tensor products, but bears similarities with the well-known Scott semantics of higher-order programs through the use ordered Banach spaces which allow definitions in terms of fixed points. Being based on a monoidal rather than cartesian closed structure, our semantics effectively treats randomness as a resource. |
Tasks | |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11189v1 |
http://arxiv.org/pdf/1902.11189v1.pdf | |
PWC | https://paperswithcode.com/paper/semantics-of-higher-order-probabilistic |
Repo | |
Framework | |
Identifying Malicious Players in GWAP-based Disaster Monitoring Crowdsourcing System
Title | Identifying Malicious Players in GWAP-based Disaster Monitoring Crowdsourcing System |
Authors | Changkun Ou, Yifei Zhan, Yaxi Chen |
Abstract | Disaster monitoring is challenging due to the lake of infrastructures in monitoring areas. Based on the theory of Game-With-A-Purpose (GWAP), this paper contributes to a novel large-scale crowdsourcing disaster monitoring system. The system analyzes tagged satellite pictures from anonymous players, and then reports aggregated and evaluated monitoring results to its stakeholders. An algorithm based on directed graph centralities is presented to address the core issues of malicious user detection and disaster level calculation. Our method can be easily applied in other human computation systems. In the end, some issues with possible solutions are discussed for our future work. |
Tasks | |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1910.01459v1 |
https://arxiv.org/pdf/1910.01459v1.pdf | |
PWC | https://paperswithcode.com/paper/identifying-malicious-players-in-gwap-based |
Repo | |
Framework | |
DeepMimic: Mentor-Student Unlabeled Data Based Training
Title | DeepMimic: Mentor-Student Unlabeled Data Based Training |
Authors | Itay Mosafi, Eli David, Nathan S. Netanyahu |
Abstract | In this paper, we present a deep neural network (DNN) training approach called the “DeepMimic” training method. Enormous amounts of data are available nowadays for training usage. Yet, only a tiny portion of these data is manually labeled, whereas almost all of the data are unlabeled. The training approach presented utilizes, in a most simplified manner, the unlabeled data to the fullest, in order to achieve remarkable (classification) results. Our DeepMimic method uses a small portion of labeled data and a large amount of unlabeled data for the training process, as expected in a real-world scenario. It consists of a mentor model and a student model. Employing a mentor model trained on a small portion of the labeled data and then feeding it only with unlabeled data, we show how to obtain a (simplified) student model that reaches the same accuracy and loss as the mentor model, on the same test set, without using any of the original data labels in the training of the student model. Our experiments demonstrate that even on challenging classification tasks the student network architecture can be simplified significantly with a minor influence on the performance, i.e., we need not even know the original network architecture of the mentor. In addition, the time required for training the student model to reach the mentor’s performance level is shorter, as a result of a simplified architecture and more available data. The proposed method highlights the disadvantages of regular supervised training and demonstrates the benefits of a less traditional training approach. |
Tasks | |
Published | 2019-11-24 |
URL | https://arxiv.org/abs/1912.00079v1 |
https://arxiv.org/pdf/1912.00079v1.pdf | |
PWC | https://paperswithcode.com/paper/deepmimic-mentor-student-unlabeled-data-based |
Repo | |
Framework | |
Classification of chaotic time series with deep learning
Title | Classification of chaotic time series with deep learning |
Authors | Nicolas Boullé, Vassilios Dallas, Yuji Nakatsukasa, D. Samaddar |
Abstract | We use standard deep neural networks to classify univariate time series generated by discrete and continuous dynamical systems based on their chaotic or non-chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynamical system with a basic or low-dimensional phase space and then use these networks to classify univariate time series of a dynamical system with more intricate or high-dimensional phase space. We illustrate this generalisation approach using the logistic map, the sine-circle map, the Lorenz system, and the Kuramoto–Sivashinsky equation. We observe that a convolutional neural network without batch normalization layers outperforms state-of-the-art neural networks for time series classification and is able to generalise and classify time series as chaotic or not with high accuracy. |
Tasks | Time Series, Time Series Classification |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1908.06848v3 |
https://arxiv.org/pdf/1908.06848v3.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-chaotic-time-series-with |
Repo | |
Framework | |
Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
Title | Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model |
Authors | Giuseppe Castellucci, Valentina Bellomaria, Andrea Favalli, Raniero Romagnoli |
Abstract | Intent Detection and Slot Filling are two pillar tasks in Spoken Natural Language Understanding. Common approaches adopt joint Deep Learning architectures in attention-based recurrent frameworks. In this work, we aim at exploiting the success of “recurrence-less” models for these tasks. We introduce Bert-Joint, i.e., a multi-lingual joint text classification and sequence labeling framework. The experimental evaluation over two well-known English benchmarks demonstrates the strong performances that can be obtained with this model, even when few annotated data is available. Moreover, we annotated a new dataset for the Italian language, and we observed similar performances without the need for changing the model. |
Tasks | Intent Detection, Slot Filling, Text Classification |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.02884v1 |
https://arxiv.org/pdf/1907.02884v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-lingual-intent-detection-and-slot |
Repo | |
Framework | |
Automated Search for Configurations of Deep Neural Network Architectures
Title | Automated Search for Configurations of Deep Neural Network Architectures |
Authors | Salah Ghamizi, Maxime Cordy, Mike Papadakis, Yves Le Traon |
Abstract | Deep Neural Networks (DNNs) are intensively used to solve a wide variety of complex problems. Although powerful, such systems require manual configuration and tuning. To this end, we view DNNs as configurable systems and propose an end-to-end framework that allows the configuration, evaluation and automated search for DNN architectures. Therefore, our contribution is threefold. First, we model the variability of DNN architectures with a Feature Model (FM) that generalizes over existing architectures. Each valid configuration of the FM corresponds to a valid DNN model that can be built and trained. Second, we implement, on top of Tensorflow, an automated procedure to deploy, train and evaluate the performance of a configured model. Third, we propose a method to search for configurations and demonstrate that it leads to good DNN models. We evaluate our method by applying it on image classification tasks (MNIST, CIFAR-10) and show that, with limited amount of computation and training, our method can identify high-performing architectures (with high accuracy). We also demonstrate that we outperform existing state-of-the-art architectures handcrafted by ML researchers. Our FM and framework have been released %and are publicly available to support replication and future research. |
Tasks | Image Classification |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04612v1 |
http://arxiv.org/pdf/1904.04612v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-search-for-configurations-of-deep |
Repo | |
Framework | |
Formal Language Constraints for Markov Decision Processes
Title | Formal Language Constraints for Markov Decision Processes |
Authors | Eleanor Quint, Dong Xu, Haluk Dogan, Zeynep Hakguder, Stephen Scott, Matthew Dwyer |
Abstract | In order to satisfy safety conditions, a reinforcement learned (RL) agent maybe constrained from acting freely, e.g., to prevent trajectories that might cause unwanted behavior or physical damage in a robot. We propose a general framework for augmenting a Markov decision process (MDP) with constraints that are described in formal languages over sequences of MDP states and agent actions. Constraint enforcement is implemented by filtering the allowed action set or by applying potential-based reward shaping to implement hard and soft constraint enforcement, respectively. We instantiate this framework using deterministic finite automata to encode constraints and propose methods of augmenting MDP observations with the state of the constraint automaton for learning. We empirically evaluate these methods with a variety of constraints by training Deep Q-Networks in Atari games as well as Proximal Policy Optimization in MuJoCo environments. We experimentally find that our approaches are effective in significantly reducing or eliminating constraint violations with either minimal negative or, depending on the constraint, a clear positive impact on final performance. |
Tasks | Atari Games |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.01074v1 |
https://arxiv.org/pdf/1910.01074v1.pdf | |
PWC | https://paperswithcode.com/paper/formal-language-constraints-for-markov |
Repo | |
Framework | |
CorrGAN: Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks
Title | CorrGAN: Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks |
Authors | Gautier Marti |
Abstract | We propose a novel approach for sampling realistic financial correlation matrices. This approach is based on generative adversarial networks. Experiments demonstrate that generative adversarial networks are able to recover most of the known stylized facts about empirical correlation matrices estimated on asset returns. This is the first time such results are documented in the literature. Practical financial applications range from trading strategies enhancement to risk and portfolio stress testing. Such generative models can also help ground empirical finance deeper into science by allowing for falsifiability of statements and more objective comparison of empirical methods. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09504v2 |
https://arxiv.org/pdf/1910.09504v2.pdf | |
PWC | https://paperswithcode.com/paper/corrgan-sampling-realistic-financial |
Repo | |
Framework | |
PSDNet and DPDNet: Efficient channel expansion, Depthwise-Pointwise-Depthwise Inverted Bottleneck Block
Title | PSDNet and DPDNet: Efficient channel expansion, Depthwise-Pointwise-Depthwise Inverted Bottleneck Block |
Authors | Guoqing Li, Meng Zhang, Qianru Zhang, Ziyang Chen, Wenzhao Liu, Jiaojie Li, Xuzhao Shen, Jianjun Li, Zhenyu Zhu, Chau Yuen |
Abstract | In many real-time applications, the deployment of deep neural networks is constrained by high computational cost and efficient lightweight neural networks are widely concerned. In this paper, we propose that depthwise convolution (DWC) is used to expand the number of channels in a bottleneck block, which is more efficient than 1 x 1 convolution. The proposed Pointwise-Standard-Depthwise network (PSDNet) based on channel expansion with DWC has fewer number of parameters, less computational cost and higher accuracy than corresponding ResNet on CIFAR datasets. To design more efficient lightweight concolutional neural netwok, Depthwise-Pointwise-Depthwise inverted bottleneck block (DPD block) is proposed and DPDNet is designed by stacking DPD block. Meanwhile, the number of parameters of DPDNet is only about 60% of that of MobileNetV2 for networks with the same number of layers, but can achieve approximated accuracy. Additionally, two hyperparameters of DPDNet can make the trade-off between accuracy and computational cost, which makes DPDNet suitable for diverse tasks. Furthermore, we find the networks with more DWC layers outperform the networks with more 1x1 convolution layers, which indicates that extracting spatial information is more important than combining channel information. |
Tasks | |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01026v2 |
https://arxiv.org/pdf/1909.01026v2.pdf | |
PWC | https://paperswithcode.com/paper/psdnet-and-dpdnet-efficient-channel-expansion |
Repo | |
Framework | |
DeepWait: Pedestrian Wait Time Estimation in Mixed Traffic Conditions Using Deep Survival Analysis
Title | DeepWait: Pedestrian Wait Time Estimation in Mixed Traffic Conditions Using Deep Survival Analysis |
Authors | Arash Kalatian, Bilal Farooq |
Abstract | Pedestrian’s road crossing behaviour is one of the important aspects of urban dynamics that will be affected by the introduction of autonomous vehicles. In this study we introduce DeepSurvival, a novel framework for estimating pedestrian’s waiting time at unsignalized mid-block crosswalks in mixed traffic conditions. We exploit the strengths of deep learning in capturing the nonlinearities in the data and develop a cox proportional hazard model with a deep neural network as the log-risk function. An embedded feature selection algorithm for reducing data dimensionality and enhancing the interpretability of the network is also developed. We test our framework on a dataset collected from 160 participants using an immersive virtual reality environment. Validation results showed that with a C-index of 0.64 our proposed framework outperformed the standard cox proportional hazard-based model with a C-index of 0.58. |
Tasks | Autonomous Vehicles, Feature Selection, Survival Analysis |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.11008v3 |
https://arxiv.org/pdf/1904.11008v3.pdf | |
PWC | https://paperswithcode.com/paper/190411008 |
Repo | |
Framework | |
An Information-Theoretical Approach to the Information Capacity and Cost-Effectiveness Evaluation of Color Palettes
Title | An Information-Theoretical Approach to the Information Capacity and Cost-Effectiveness Evaluation of Color Palettes |
Authors | R. Tanju Sirmen, B. Burak Ustundag |
Abstract | Colors are used as effective tools of representing and transferring information. Number of colors in a palette is the direct arbiter of the information conveying capacity. Yet it should be well elaborated, since increasing the entropy by adding colors comes with its cost on decoding. Despite the possible effects upon diverse applications, a methodology for cost-effectiveness evaluation of palettes seems deficient. In this work, this need is being addressed from an information-theoretical perspective, via the articulated metrics and formulae. Besides, the proposed metrics are computed for some developed and known palettes, and observed results are evaluated. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.02567v1 |
https://arxiv.org/pdf/1906.02567v1.pdf | |
PWC | https://paperswithcode.com/paper/190602567 |
Repo | |
Framework | |