January 25, 2020

2976 words 14 mins read

Paper Group NAWR 42

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering. RedTyp: A Database of Reduplication with Computational Models. An Encoding Strategy Based Word-Character LSTM for Chinese NER. Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations. Attentive Feedback Network …

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering


Title	Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
Authors	Peng Gao, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven C. H. Hoi, Xiaogang Wang, Hongsheng Li
Abstract	Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
Tasks	Question Answering, Visual Question Answering
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Gao_Dynamic_Fusion_With_Intra-_and_Inter-Modality_Attention_Flow_for_Visual_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Gao_Dynamic_Fusion_With_Intra-_and_Inter-Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/dynamic-fusion-with-intra-and-inter-modality-1
Repo	https://github.com/bupt-cist/DFAF-for-VQA.pytorch
Framework	pytorch

RedTyp: A Database of Reduplication with Computational Models


Title	RedTyp: A Database of Reduplication with Computational Models
Authors	Hossep Dolatian, Jeffrey Heinz
Abstract
Tasks
Published	2019-01-01
URL	https://www.aclweb.org/anthology/W19-0102/
PDF	https://www.aclweb.org/anthology/W19-0102
PWC	https://paperswithcode.com/paper/redtyp-a-database-of-reduplication-with
Repo	https://github.com/jhdeov/RedTyp
Framework	none

An Encoding Strategy Based Word-Character LSTM for Chinese NER


Title	An Encoding Strategy Based Word-Character LSTM for Chinese NER
Authors	Wei Liu, Tongge Xu, Qinghua Xu, Jiayu Song, Yueran Zu
Abstract	A recently proposed lattice model has demonstrated that words in character sequence can provide rich word boundary information for character-based Chinese NER model. In this model, word information is integrated into a shortcut path between the start and the end characters of the word. However, the existence of shortcut path may cause the model to degenerate into a partial word-based model, which will suffer from word segmentation errors. Furthermore, the lattice model can not be trained in batches due to its DAG structure. In this paper, we propose a novel word-character LSTM(WC-LSTM) model to add word information into the start or the end character of the word, alleviating the influence of word segmentation errors while obtaining the word boundary information. Four different strategies are explored in our model to encode word information into a fixed-sized representation for efficient batch training. Experiments on benchmark datasets show that our proposed model outperforms other state-of-the-arts models.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1247/
PDF	https://www.aclweb.org/anthology/N19-1247
PWC	https://paperswithcode.com/paper/an-encoding-strategy-based-word-character
Repo	https://github.com/liuwei1206/CCW-NER
Framework	pytorch

Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations


Title	Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations
Authors	Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma
Abstract	Most previous fusion strategies either fail to fully utilize temporal information or cost too much time, and how to effectively fuse temporal information from consecutive frames plays an important role in video super-resolution (SR). In this study, we propose a novel progressive fusion network for video SR, which is designed to make better use of spatio-temporal information and is proved to be more efficient and effective than the existing direct fusion, slow fusion or 3D convolution strategies. Under this progressive fusion framework, we further introduce an improved non-local operation to avoid the complex motion estimation and motion compensation (ME&MC) procedures as in previous video SR approaches. Extensive experiments on public datasets demonstrate that our method surpasses state-of-the-art with 0.96 dB in average, and runs about 3 times faster, while requires only about half of the parameters.
Tasks	Motion Compensation, Motion Estimation, Super-Resolution, Video Super-Resolution
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Yi_Progressive_Fusion_Video_Super-Resolution_Network_via_Exploiting_Non-Local_Spatio-Temporal_Correlations_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Yi_Progressive_Fusion_Video_Super-Resolution_Network_via_Exploiting_Non-Local_Spatio-Temporal_Correlations_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/progressive-fusion-video-super-resolution
Repo	https://github.com/psychopa4/PFNL
Framework	tf

Attentive Feedback Network for Boundary-Aware Salient Object Detection


Title	Attentive Feedback Network for Boundary-Aware Salient Object Detection
Authors	Mengyang Feng, Huchuan Lu, Errui Ding
Abstract	Recent deep learning based salient object detection methods achieve gratifying performance built upon Fully Convolutional Neural Networks (FCNs). However, most of them have suffered from the boundary challenge. The state-of-the-art methods employ feature aggregation tech- nique and can precisely find out wherein the salient object, but they often fail to segment out the entire object with fine boundaries, especially those raised narrow stripes. So there is still a large room for improvement over the FCN based models. In this paper, we design the Attentive Feedback Modules (AFMs) to better explore the structure of objects. A Boundary-Enhanced Loss (BEL) is further employed for learning exquisite boundaries. Our proposed deep model produces satisfying results on the object boundaries and achieves state-of-the-art performance on five widely tested salient object detection benchmarks. The network is in a fully convolutional fashion running at a speed of 26 FPS and does not need any post-processing.
Tasks	Object Detection, Salient Object Detection
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Feng_Attentive_Feedback_Network_for_Boundary-Aware_Salient_Object_Detection_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Feng_Attentive_Feedback_Network_for_Boundary-Aware_Salient_Object_Detection_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/attentive-feedback-network-for-boundary-aware
Repo	https://github.com/ArcherFMY/AFNet
Framework	none

Superset Technique for Approximate Recovery in One-Bit Compressed Sensing


Title	Superset Technique for Approximate Recovery in One-Bit Compressed Sensing
Authors	Larkin Flodin, Venkata Gandikota, Arya Mazumdar
Abstract	One-bit compressed sensing (1bCS) is a method of signal acquisition under extreme measurement quantization that gives important insights on the limits of signal compression and analog-to-digital conversion. The setting is also equivalent to the problem of learning a sparse hyperplane-classifier. In this paper, we propose a generic approach for signal recovery in nonadaptive 1bCS that leads to improved sample complexity for approximate recovery for a variety of signal models, including nonnegative signals and binary signals. We construct 1bCS matrices that are universal - i.e. work for all signals under a model - and at the same time recover very general random sparse signals with high probability. In our approach, we divide the set of samples (measurements) into two parts, and use the first part to recover the superset of the support of a sparse vector. The second set of measurements is then used to approximate the signal within the superset. While support recovery in 1bCS is well-studied, recovery of superset of the support requires fewer samples, which then leads to an overall reduction in sample complexity for approximate recovery.
Tasks	Quantization
Published	2019-12-01
URL	http://papers.nips.cc/paper/9226-superset-technique-for-approximate-recovery-in-one-bit-compressed-sensing
PDF	http://papers.nips.cc/paper/9226-superset-technique-for-approximate-recovery-in-one-bit-compressed-sensing.pdf
PWC	https://paperswithcode.com/paper/superset-technique-for-approximate-recovery
Repo	https://github.com/flodinl/neurips-1bCS
Framework	none

Learning Neural Representations for Network Anomaly Detection


Title	Learning Neural Representations for Network Anomaly Detection
Authors	Van Loi Cao, Miguel Nicolau, and James McDermott
Abstract	This paper proposes latent representation models for improving network anomaly detection. Well-known anomaly detection algorithms often suffer from challenges posed by network data, such as high dimension and sparsity, and a lack of anomaly data for training, model selection, and hyperparameter tuning. Our approach is to introduce new regularizers to a classical autoencoder (AE) and a variational AE, which force normal data into a very tight area centered at the origin in the nonsaturating area of the bottleneck unit activations. These trained AEs on normal data will push normal points toward the origin, whereas anomalies, which differ from normal data, will be put far away from the normal region. The models are very different from common regularized AEs, sparse AE, and contractive AE,in which the regularized AEs tend to make their latent representation less sensitive to changes of the input data. The bottleneck feature space is now used as a new data representation. A number of one-class learning algorithms are used for evaluating the proposed models. The experiments testify that our models help these classifiers to perform efficiently and consistently on highdimensional and sparse network datasets, even with relatively few training points. More importantly, the models can minimize the effect of model selection on these classifiers since their performance is insensitive to a wide range of hyperparameter settings.
Tasks	Anomaly Detection, Intrusion Detection, Model Selection, Unsupervised Anomaly Detection
Published	2019-08-01
URL	https://ieeexplore.ieee.org/abstract/document/8386786
PDF	https://ieeexplore.ieee.org/abstract/document/8386786
PWC	https://paperswithcode.com/paper/learning-neural-representations-for-network
Repo	https://github.com/vanloicao/SAEDVAE
Framework	tf

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction


Title	DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction
Authors	Xiaoxing Zeng, Xiaojiang Peng, Yu Qiao
Abstract	Reconstructing the detailed geometric structure from a single face image is a challenging problem due to its ill-posed nature and the fine 3D structures to be recovered. This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem. DF2Net decomposes the reconstruction process into three stages, each of which is processed by an elaborately-designed network, namely D-Net, F-Net, and Fr-Net. D-Net exploits a U-net architecture to map the input image to a dense depth image. F-Net refines the output of D-Net by integrating features from depth and RGB domains, whose output is further enhanced by Fr-Net with a novel multi-resolution hypercolumn architecture. In addition, we introduce three types of data to train these networks, including 3D model synthetic data, 2D image reconstructed data, and fine facial images. We elaborately exploit different datasets (or combination) together with well-designed losses to train different networks. Qualitative evaluation indicates that our DF2Net can effectively reconstruct subtle facial details such as small crow’s feet and wrinkles. Our DF2Net achieves performance superior or comparable to state-of-the-art algorithms in qualitative and quantitative analyses on real-world images and the BU-3DFE dataset. Code and the collected 70K image-depth data will be publicly available.
Tasks	3D Face Reconstruction, Face Reconstruction
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Zeng_DF2Net_A_Dense-Fine-Finer_Network_for_Detailed_3D_Face_Reconstruction_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Zeng_DF2Net_A_Dense-Fine-Finer_Network_for_Detailed_3D_Face_Reconstruction_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/df2net-a-dense-fine-finer-network-for
Repo	https://github.com/xiaoxingzeng/DF2Net
Framework	pytorch

Bidirectional Transition-Based Dependency Parsing


Title	Bidirectional Transition-Based Dependency Parsing
Authors	Yunzhe Yuan, Yong Jiang, Kewei Tu
Abstract	Transition-based dependency parsing is a fast and effective approach for dependency parsing. Traditionally, a transitionbased dependency parser processes an input sentence and predicts a sequence of parsing actions in a left-to-right manner. During this process, an early prediction error may negatively impact the prediction of subsequent actions. In this paper, we propose a simple framework for bidirectional transitionbased parsing. During training, we learn a left-to-right parser and a right-to-left parser separately. To parse a sentence, we perform joint decoding with the two parsers. We propose three joint decoding algorithms that are based on joint scoring, dual decomposition, and dynamic oracle respectively. Empirical results show that our methods lead to competitive parsing accuracy and our method based on dynamic oracle consistently achieves the best performance.
Tasks	Dependency Parsing, Transition-Based Dependency Parsing
Published	2019-07-17
URL	https://aaai.org/ojs/index.php/AAAI/article/view/4733
PDF	https://aaai.org/ojs/index.php/AAAI/article/view/4733/4611
PWC	https://paperswithcode.com/paper/bidirectional-transition-based-dependency
Repo	https://github.com/yuanyunzhe/bi-trans-parser
Framework	none

Reconciling λ-Returns with Experience Replay


Title	Reconciling λ-Returns with Experience Replay
Authors	Brett Daley, Christopher Amato
Abstract	Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values. Computation is not wasted on experiences that are never sampled, and stored λ-returns behave as stable temporal-difference (TD) targets that replace the target network. Additionally, our method grants the unique ability to observe TD errors prior to sampling; for the first time, transitions can be prioritized by their true significance rather than by a proxy to it. Furthermore, we propose the novel use of the TD error to dynamically select λ-values that facilitate faster learning. We show that these innovations can enhance the performance of DQN when playing Atari 2600 games, even under partial observability. While our work specifically focuses on λ-returns, these ideas are applicable to any multi-step return estimator.
Tasks	Atari Games
Published	2019-12-01
URL	http://papers.nips.cc/paper/8397-reconciling-returns-with-experience-replay
PDF	http://papers.nips.cc/paper/8397-reconciling-returns-with-experience-replay.pdf
PWC	https://paperswithcode.com/paper/reconciling-returns-with-experience-replay
Repo	https://github.com/brett-daley/dqn-lambda
Framework	tf

Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo


Title	Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo
Authors	Stepan Tulyakov, Francois Fleuret, Martin Kiefel, Peter Gehler, Michael Hirsch
Abstract	Today, a frame-based camera is the sensor of choice for machine vision applications. However, these cameras, originally developed for acquisition of static images rather than for sensing of dynamic uncontrolled visual environments, suffer from high power consumption, data rate, latency and low dynamic range. An event-based image sensor addresses these drawbacks by mimicking a biological retina. Instead of measuring the intensity of every pixel in a fixed time-interval, it reports events of significant pixel intensity changes. Every such event is represented by its position, sign of change, and timestamp, accurate to the microsecond. Asynchronous event sequences require special handling, since traditional algorithms work only with synchronous, spatially gridded data. To address this problem we introduce a new module for event sequence embedding, for use in difference applications. The module builds a representation of an event sequence by firstly aggregating information locally across time, using a novel fully-connected layer for an irregularly sampled continuous domain, and then across discrete spatial domain. Based on this module, we design a deep learning-based stereo method for event-based cameras. The proposed method is the first learning-based stereo method for an event-based camera and the only method that produces dense results. We show that large performance increases on the Multi Vehicle Stereo Event Camera Dataset (MVSEC), which became the standard set for benchmarking of event-based stereo methods.
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Tulyakov_Learning_an_Event_Sequence_Embedding_for_Dense_Event-Based_Deep_Stereo_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Tulyakov_Learning_an_Event_Sequence_Embedding_for_Dense_Event-Based_Deep_Stereo_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-an-event-sequence-embedding-for
Repo	https://github.com/tlkvstepan/event_stereo_ICCV2019
Framework	pytorch

Offline Contextual Bandits with High Probability Fairness Guarantees


Title	Offline Contextual Bandits with High Probability Fairness Guarantees
Authors	Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas
Abstract	We present RobinHood, an ofﬂine contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Our algorithm accepts multiple fairness deﬁnitions and allows users to construct their own unique fairness deﬁnitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-speciﬁed threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness deﬁnitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness deﬁnitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other ofﬂine and online contextual bandit algorithms.
Tasks	Multi-Armed Bandits
Published	2019-12-01
URL	http://papers.nips.cc/paper/9630-offline-contextual-bandits-with-high-probability-fairness-guarantees
PDF	http://papers.nips.cc/paper/9630-offline-contextual-bandits-with-high-probability-fairness-guarantees.pdf
PWC	https://paperswithcode.com/paper/offline-contextual-bandits-with-high
Repo	https://github.com/sgiguere/RobinHood-NeurIPS-2019
Framework	none

BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos


Title	BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos
Authors	Eleanor Batty, Matthew Whiteway, Shreya Saxena, Dan Biderman, Taiga Abe, Simon Musall, Winthrop Gillis, Jeffrey Markowitz, Anne Churchland, John P. Cunningham, Sandeep R. Datta, Scott Linderman, Liam Paninski
Abstract	A fundamental goal of systems neuroscience is to understand the relationship between neural activity and behavior. Behavior has traditionally been characterized by low-dimensional, task-related variables such as movement speed or response times. More recently, there has been a growing interest in automated analysis of high-dimensional video data collected during experiments. Here we introduce a probabilistic framework for the analysis of behavioral video and neural activity. This framework provides tools for compression, segmentation, generation, and decoding of behavioral videos. Compression is performed using a convolutional autoencoder (CAE), which yields a low-dimensional continuous representation of behavior. We then use an autoregressive hidden Markov model (ARHMM) to segment the CAE representation into discrete “behavioral syllables.” The resulting generative model can be used to simulate behavioral video data. Finally, based on this generative model, we develop a novel Bayesian decoding approach that takes in neural activity and outputs probabilistic estimates of the full-resolution behavioral video. We demonstrate this framework on two different experimental paradigms using distinct behavioral and neural recording technologies.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9701-behavenet-nonlinear-embedding-and-bayesian-neural-decoding-of-behavioral-videos
PDF	http://papers.nips.cc/paper/9701-behavenet-nonlinear-embedding-and-bayesian-neural-decoding-of-behavioral-videos.pdf
PWC	https://paperswithcode.com/paper/behavenet-nonlinear-embedding-and-bayesian
Repo	https://github.com/ebatty/behavenet
Framework	pytorch

Graph-based Dependency Parsing with Graph Neural Networks


Title	Graph-based Dependency Parsing with Graph Neural Networks
Authors	Tao Ji, Yuanbin Wu, Man Lan
Abstract	We investigate the problem of efficiently incorporating high-order features into neural graph-based dependency parsing. Instead of explicitly extracting high-order features from intermediate parse trees, we develop a more powerful dependency tree node representation which captures high-order information concisely and efficiently. We use graph neural networks (GNNs) to learn the representations and discuss several new configurations of GNN{'}s updating and aggregation functions. Experiments on PTB show that our parser achieves the best UAS and LAS on PTB (96.0{%}, 94.3{%}) among systems without using any external resources.
Tasks	Dependency Parsing
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1237/
PDF	https://www.aclweb.org/anthology/P19-1237
PWC	https://paperswithcode.com/paper/graph-based-dependency-parsing-with-graph
Repo	https://github.com/AntNLP/gnn-dep-parsing
Framework	none

Molecule Property Prediction Based on Spatial Graph Embedding


Title	Molecule Property Prediction Based on Spatial Graph Embedding
Authors	Xiaofeng Wang, Zhen Li, Mingjian Jiang, Shuang Wang, Shugang Zhang, Zhiqiang Wei
Abstract	Accurate prediction of molecular properties is important for new compound design, which is a crucial step in drug discovery. In this paper, molecular graph data is utilized for property prediction based on graph convolution neural networks. In addition, a convolution spatial graph embedding layer (C-SGEL) is introduced to retain the spatial connection information on molecules. And, multiple C-SGELs are stacked to construct a convolution spatial graph embedding network (C-SGEN) for end-to-end representation learning. In order to enhance the robustness of the network, molecular fingerprints are also combined with C-SGEN to build a composite model for predicting molecular properties. Our comparative experiments have shown that our method is accurate and achieves the best results on some open benchmark datasets.
Tasks	Drug Discovery, Graph Embedding, Graph Regression, Representation Learning
Published	2019-08-22
URL	https://doi.org/10.1021/acs.jcim.9b00410
PDF	https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.9b00410?rand=oin4mnup
PWC	https://paperswithcode.com/paper/molecule-property-prediction-based-on-spatial
Repo	https://github.com/wxfsd/C-SGEN
Framework	pytorch