Paper Group ANR 1642
Sparse Sequence-to-Sequence Models. Privacy-Preserving Causal Inference via Inverse Probability Weighting. Learning Regional Attraction for Line Segment Detection. A GMM based algorithm to generate point-cloud and its application to neuroimaging. Domain Differential Adaptation for Neural Machine Translation. Machine and Deep Learning for Crowd Anal …
Sparse Sequence-to-Sequence Models
Title | Sparse Sequence-to-Sequence Models |
Authors | Ben Peters, Vlad Niculae, André F. T. Martins |
Abstract | Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of $\alpha$-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any $\alpha > 1$. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models. |
Tasks | Machine Translation, Morphological Inflection |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05702v2 |
https://arxiv.org/pdf/1905.05702v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-sequence-to-sequence-models |
Repo | |
Framework | |
Privacy-Preserving Causal Inference via Inverse Probability Weighting
Title | Privacy-Preserving Causal Inference via Inverse Probability Weighting |
Authors | Si Kai Lee, Luigi Gresele, Mijung Park, Krikamol Muandet |
Abstract | The use of inverse probability weighting (IPW) methods to estimate the causal effect of treatments from observational studies is widespread in econometrics, medicine and social sciences. Although these studies often involve sensitive information, thus far there has been no work on privacy-preserving IPW methods. We address this by providing a novel framework for privacy-preserving IPW (PP-IPW) methods. We include a theoretical analysis of the effects of our proposed privatisation procedure on the estimated average treatment effect, and evaluate our PP-IPW framework on synthetic, semi-synthetic and real datasets. The empirical results are consistent with our theoretical findings. |
Tasks | Causal Inference |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12592v2 |
https://arxiv.org/pdf/1905.12592v2.pdf | |
PWC | https://paperswithcode.com/paper/private-causal-inference-using-propensity |
Repo | |
Framework | |
Learning Regional Attraction for Line Segment Detection
Title | Learning Regional Attraction for Line Segment Detection |
Authors | Nan Xue, Song Bai, Fu-Dong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang, Philip H. S. Torr |
Abstract | This paper presents regional attraction of line segment maps, and hereby poses the problem of line segment detection (LSD) as a problem of region coloring. Given a line segment map, the proposed regional attraction first establishes the relationship between line segments and regions in the image lattice. Based on this, the line segment map is equivalently transformed to an attraction field map (AFM), which can be remapped to a set of line segments without loss of information. Accordingly, we develop an end-to-end framework to learn attraction field maps for raw input images, followed by a squeeze module to detect line segments. Apart from existing works, the proposed detector properly handles the local ambiguity and does not rely on the accurate identification of edge pixels. Comprehensive experiments on the Wireframe dataset and the YorkUrban dataset demonstrate the superiority of our method. In particular, we achieve an F-measure of 0.831 on the Wireframe dataset, advancing the state-of-the-art performance by 10.3 percent. |
Tasks | Line Segment Detection |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.09344v1 |
https://arxiv.org/pdf/1912.09344v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-regional-attraction-for-line-segment |
Repo | |
Framework | |
A GMM based algorithm to generate point-cloud and its application to neuroimaging
Title | A GMM based algorithm to generate point-cloud and its application to neuroimaging |
Authors | Liu Yang, Rudrasis Chakraborty |
Abstract | Recent years have witnessed the emergence of 3D medical imaging techniques with the development of 3D sensors and technology. Due to the presence of noise in image acquisition, registration researchers focused on an alternative way to represent medical images. An alternative way to analyze medical imaging is by understanding the 3D shapes represented in terms of point-cloud. Though in the medical imaging community, 3D point-cloud processing is not a go-to'' choice, it is a natural’’ way to capture 3D shapes. However, as the number of samples for medical images are small, researchers have used pre-trained models to fine-tune on medical images. Furthermore, due to different modality in medical images, standard generative models can not be used to generate new samples of medical images. In this work, we use the advantage of point-cloud representation of 3D structures of medical images and propose a Gaussian mixture model-based generation scheme. Our proposed method is robust to outliers. Experimental validation has been performed to show that the proposed scheme can generate new 3D structures using interpolation techniques, i.e., given two 3D structures represented as point-clouds, we can generate point-clouds in between. We have also generated new point-clouds for subjects with and without dementia and show that the generated samples are indeed closely matched to the respective training samples from the same class. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01705v1 |
https://arxiv.org/pdf/1911.01705v1.pdf | |
PWC | https://paperswithcode.com/paper/a-gmm-based-algorithm-to-generate-point-cloud |
Repo | |
Framework | |
Domain Differential Adaptation for Neural Machine Translation
Title | Domain Differential Adaptation for Neural Machine Translation |
Authors | Zi-Yi Dou, Xinyi Wang, Junjie Hu, Graham Neubig |
Abstract | Neural networks are known to be data hungry and domain sensitive, but it is nearly impossible to obtain large quantities of labeled data for every domain we are interested in. This necessitates the use of domain adaptation strategies. One common strategy encourages generalization by aligning the global distribution statistics between source and target domains, but one drawback is that the statistics of different domains or tasks are inherently divergent, and smoothing over these differences can lead to sub-optimal performance. In this paper, we propose the framework of {\it Domain Differential Adaptation (DDA)}, where instead of smoothing over these differences we embrace them, directly modeling the difference between domains using models in a related task. We then use these learned domain differentials to adapt models for the target task accordingly. Experimental results on domain adaptation for neural machine translation demonstrate the effectiveness of this strategy, achieving consistent improvements over other alternative adaptation strategies in multiple experimental settings. |
Tasks | Domain Adaptation, Machine Translation |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02555v1 |
https://arxiv.org/pdf/1910.02555v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-differential-adaptation-for-neural |
Repo | |
Framework | |
Machine and Deep Learning for Crowd Analytics
Title | Machine and Deep Learning for Crowd Analytics |
Authors | Muhammad Siraj |
Abstract | In high population cities, the gatherings of large crowds in public places and public areas accelerate or jeopardize people safety and transportation, which is a key challenge to the researchers. Although much research has been carried out on crowd analytics, many of existing methods are problem-specific, i.e., methods learned from a specific scene cannot be properly adopted to other videos. Therefore, this presents weakness and the discovery of these researches, since additional training samples have to be found from diverse videos. This paper will investigate diverse scene crowd analytics with traditional and deep learning models. We will also consider pros and cons of these approaches. However, once general deep methods are investigated from large datasets, they can be consider to investigate different crowd videos and images. Therefore, it would be able to cope with the problem including to not limited to crowd density estimation, crowd people counting, and crowd event recognition. Deep learning models and approaches are required to have large datasets for training and testing. Many datasets are collected taking into account many different and various problems related to building crowd datasets, including manual annotations and increasing diversity of videos and images. In this paper, we will also propose many models of deep neural networks and training approaches to learn the feature modeling for crowd analytics. |
Tasks | Density Estimation |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1909.04150v1 |
https://arxiv.org/pdf/1909.04150v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-and-deep-learning-for-crowd-analytics |
Repo | |
Framework | |
Temporally Consistent Depth Prediction with Flow-Guided Memory Units
Title | Temporally Consistent Depth Prediction with Flow-Guided Memory Units |
Authors | Chanho Eom, Hyunjong Park, Bumsub Ham |
Abstract | Predicting depth from a monocular video sequence is an important task for autonomous driving. Although it has advanced considerably in the past few years, recent methods based on convolutional neural networks (CNNs) discard temporal coherence in the video sequence and estimate depth independently for each frame, which often leads to undesired inconsistent results over time. To address this problem, we propose to memorize temporal consistency in the video sequence, and leverage it for the task of depth prediction. To this end, we introduce a two-stream CNN with a flow-guided memory module, where each stream encodes visual and temporal features, respectively. The memory module, implemented using convolutional gated recurrent units (ConvGRUs), inputs visual and temporal features sequentially together with optical flow tailored to our task. It memorizes trajectories of individual features selectively and propagates spatial information over time, enforcing a long-term temporal consistency to prediction results. We evaluate our method on the KITTI benchmark dataset in terms of depth prediction accuracy, temporal consistency and runtime, and achieve a new state of the art. We also provide an extensive experimental analysis, clearly demonstrating the effectiveness of our approach to memorizing temporal consistency for depth prediction. |
Tasks | Autonomous Driving, Depth Estimation, Optical Flow Estimation |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07074v1 |
https://arxiv.org/pdf/1909.07074v1.pdf | |
PWC | https://paperswithcode.com/paper/temporally-consistent-depth-prediction-with |
Repo | |
Framework | |
Explainable Text-Driven Neural Network for Stock Prediction
Title | Explainable Text-Driven Neural Network for Stock Prediction |
Authors | Linyi Yang, Zheng Zhang, Su Xiong, Lirui Wei, James Ng, Lina Xu, Ruihai Dong |
Abstract | It has been shown that financial news leads to the fluctuation of stock prices. However, previous work on news-driven financial market prediction focused only on predicting stock price movement without providing an explanation. In this paper, we propose a dual-layer attention-based neural network to address this issue. In the initial stage, we introduce a knowledge-based method to adaptively extract relevant financial news. Then, we use input attention to pay more attention to the more influential news and concatenate the day embeddings with the output of the news representation. Finally, we use an output attention mechanism to allocate different weights to different days in terms of their contribution to stock price movement. Thorough empirical studies based upon historical prices of several individual stocks demonstrate the superiority of our proposed method in stock price prediction compared to state-of-the-art methods. |
Tasks | Stock Prediction, Stock Price Prediction |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.04994v1 |
http://arxiv.org/pdf/1902.04994v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-text-driven-neural-network-for |
Repo | |
Framework | |
ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization
Title | ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization |
Authors | Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, David Cox |
Abstract | The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods. As a byproduct, our analysis makes the first step toward understanding adaptive learning rate methods for nonconvex constrained optimization. Furthermore, we demonstrate two applications, designing per-image and universal adversarial attacks from black-box neural networks, respectively. We perform extensive experiments on ImageNet and empirically show that ZO-AdaMM converges much faster to a solution of high accuracy compared with $6$ state-of-the-art ZO optimization methods. |
Tasks | |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06513v2 |
https://arxiv.org/pdf/1910.06513v2.pdf | |
PWC | https://paperswithcode.com/paper/zo-adamm-zeroth-order-adaptive-momentum |
Repo | |
Framework | |
Astraea: Self-balancing Federated Learning for Improving Classification Accuracy of Mobile Deep Learning Applications
Title | Astraea: Self-balancing Federated Learning for Improving Classification Accuracy of Mobile Deep Learning Applications |
Authors | Moming Duan |
Abstract | Federated learning (FL) is a distributed deep learning method which enables multiple participants, such as mobile phones and IoT devices, to contribute a neural network model while their private training data remains in local devices. This distributed approach is promising in the edge computing system where have a large corpus of decentralized data and require high privacy. However, unlike the common training dataset, the data distribution of the edge computing system is imbalanced which will introduce biases in the model training and cause a decrease in accuracy of federated learning applications. In this paper, we demonstrate that the imbalanced distributed training data will cause accuracy degradation in FL. To counter this problem, we build a self-balancing federated learning framework call Astraea, which alleviates the imbalances by 1) Global data distribution based data augmentation, and 2) Mediator based multi-client rescheduling. Compared with FedAvg, the state-of-the-art FL algorithm, Astraea shows +5.59% and +5.89% improvement of top-1 accuracy on the imbalanced EMNIST and imbalanced CINIC-10 datasets, respectively. Meanwhile, the communication traffic of Astraea can be 92% lower than that of FedAvg. |
Tasks | Data Augmentation |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01132v1 |
https://arxiv.org/pdf/1907.01132v1.pdf | |
PWC | https://paperswithcode.com/paper/astraea-self-balancing-federated-learning-for |
Repo | |
Framework | |
Collaborative Filtering with A Synthetic Feedback Loop
Title | Collaborative Filtering with A Synthetic Feedback Loop |
Authors | Wenlin Wang, Hongteng Xu, Ruiyi Zhang, Wenqi Wang, Lawrence Carin |
Abstract | We propose a novel learning framework for recommendation systems, assisting collaborative filtering with a synthetic feedback loop. The proposed framework consists of a “recommender” and a “virtual user.” The recommender is formulizd as a collaborative-filtering method, recommending items according to observed user behavior. The virtual user estimates rewards from the recommended items and generates the influence of the rewards on observed user behavior. The recommender connected with the virtual user constructs a closed loop, that recommends users with items and imitates the unobserved feedback of the users to the recommended items. The synthetic feedback is used to augment observed user behavior and improve recommendation results. Such a model can be interpreted as the inverse reinforcement learning, which can be learned effectively via rollout (simulation). Experimental results show that the proposed framework is able to boost the performance of existing collaborative filtering methods on multiple datasets. |
Tasks | Recommendation Systems |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.12735v1 |
https://arxiv.org/pdf/1910.12735v1.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-filtering-with-a-synthetic |
Repo | |
Framework | |
Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture
Title | Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture |
Authors | Lei Kang, Pau Riba, Mauricio Villegas, Alicia Fornés, Marçal Rusiñol |
Abstract | Sequence-to-sequence models have recently become very popular for tackling handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging problem. The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system. Thus, the bias between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this work, we introduce Candidate Fusion, a novel way to integrate an external language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two improvements. On the one hand, the sequence-to-sequence recognizer has the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided by the language model. On the other hand, the external language model has the ability to adapt itself to the training corpus and even learn the most commonly errors produced from the recognizer. Finally, by conducting comprehensive experiments, the Candidate Fusion proves to outperform the state-of-the-art language models for handwritten word recognition tasks. |
Tasks | Language Modelling |
Published | 2019-12-21 |
URL | https://arxiv.org/abs/1912.10308v1 |
https://arxiv.org/pdf/1912.10308v1.pdf | |
PWC | https://paperswithcode.com/paper/candidate-fusion-integrating-language |
Repo | |
Framework | |
Generative Modeling and Inverse Imaging of Cardiac Transmembrane Potential
Title | Generative Modeling and Inverse Imaging of Cardiac Transmembrane Potential |
Authors | Sandesh Ghimire, Jwala Dhamala, Prashnna Kumar Gyawali, John L Sapp, B. Milan Horacek, Linwei Wang |
Abstract | Noninvasive reconstruction of cardiac transmembrane potential (TMP) from surface electrocardiograms (ECG) involves an ill-posed inverse problem. Model-constrained regularization is powerful for incorporating rich physiological knowledge about spatiotemporal TMP dynamics. These models are controlled by high-dimensional physical parameters which, if fixed, can introduce model errors and reduce the accuracy of TMP reconstruction. Simultaneous adaptation of these parameters during TMP reconstruction, however, is difficult due to their high dimensionality. We introduce a novel model-constrained inference framework that replaces conventional physiological models with a deep generative model trained to generate TMP sequences from low-dimensional generative factors. Using a variational auto-encoder (VAE) with long short-term memory (LSTM) networks, we train the VAE decoder to learn the conditional likelihood of TMP, while the encoder learns the prior distribution of generative factors. These two components allow us to develop an efficient algorithm to simultaneously infer the generative factors and TMP signals from ECG data. Synthetic and real-data experiments demonstrate that the presented method significantly improve the accuracy of TMP reconstruction compared with methods constrained by conventional physiological models or without physiological constraints. |
Tasks | |
Published | 2019-05-12 |
URL | https://arxiv.org/abs/1905.04803v1 |
https://arxiv.org/pdf/1905.04803v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-modeling-and-inverse-imaging-of |
Repo | |
Framework | |
Can A User Anticipate What Her Followers Want?
Title | Can A User Anticipate What Her Followers Want? |
Authors | Abir De, Adish Singla, Utkarsh Upadhyay, Manuel Gomez-Rodriguez |
Abstract | Whenever a social media user decides to share a story, she is typically pleased to receive likes, comments, shares, or, more generally, feedback from her followers. As a result, she may feel compelled to use the feedback she receives to (re-)estimate her followers’ preferences and decides which stories to share next to receive more (positive) feedback. Under which conditions can she succeed? In this work, we first look into this problem from a theoretical perspective and then provide a set of practical algorithms to identify and characterize such behavior in social media. More specifically, we address the above problem from the viewpoint of sequential decision making and utility maximization. For a wide variety of utility functions, we first show that, to succeed, a user needs to actively trade off exploitation– sharing stories which lead to more (positive) feedback–and exploration– sharing stories to learn about her followers’ preferences. However, exploration is not necessary if a user utilizes the feedback her followers provide to other users in addition to the feedback she receives. Then, we develop a utility estimation framework for observation data, which relies on statistical hypothesis testing to determine whether a user utilizes the feedback she receives from each of her followers to decide what to post next. Experiments on synthetic data illustrate our theoretical findings and show that our estimation framework is able to accurately recover users’ underlying utility functions. Experiments on several real datasets gathered from Twitter and Reddit reveal that up to 82% (43%) of the Twitter (Reddit) users in our datasets do use the feedback they receive to decide what to post next. |
Tasks | Decision Making |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00440v2 |
https://arxiv.org/pdf/1909.00440v2.pdf | |
PWC | https://paperswithcode.com/paper/can-a-user-guess-what-her-followers-want |
Repo | |
Framework | |
Towards Combining On-Off-Policy Methods for Real-World Applications
Title | Towards Combining On-Off-Policy Methods for Real-World Applications |
Authors | Kai-Chun Hu, Chen-Huan Pi, Ting Han Wei, I-Chen Wu, Stone Cheng, Yi-Wei Dai, Wei-Yuan Ye |
Abstract | In this paper, we point out a fundamental property of the objective in reinforcement learning, with which we can reformulate the policy gradient objective into a perceptron-like loss function, removing the need to distinguish between on and off policy training. Namely, we posit that it is sufficient to only update a policy $\pi$ for cases that satisfy the condition $A(\frac{\pi}{\mu}-1)\leq0$, where $A$ is the advantage, and $\mu$ is another policy. Furthermore, we show via theoretic derivation that a perceptron-like loss function matches the clipped surrogate objective for PPO. With our new formulation, the policies $\pi$ and $\mu$ can be arbitrarily apart in theory, effectively enabling off-policy training. To examine our derivations, we can combine the on-policy PPO clipped surrogate (which we show to be equivalent with one instance of the new reformation) with the off-policy IMPALA method. We first verify the combined method on the OpenAI Gym pendulum toy problem. Next, we use our method to train a quadrotor position controller in a simulator. Our trained policy is efficient and lightweight enough to perform in a low cost micro-controller at a minimum update rate of 500 Hz. For the quadrotor, we show two experiments to verify our method and demonstrate performance: 1) hovering at a fixed position, and 2) tracking along a specific trajectory. In preliminary trials, we are also able to apply the method to a real-world quadrotor. |
Tasks | |
Published | 2019-04-24 |
URL | http://arxiv.org/abs/1904.10642v1 |
http://arxiv.org/pdf/1904.10642v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-combining-on-off-policy-methods-for |
Repo | |
Framework | |