Paper Group NAWR 42
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering. RedTyp: A Database of Reduplication with Computational Models. An Encoding Strategy Based Word-Character LSTM for Chinese NER. Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations. Attentive Feedback Network …
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
Title | Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering |
Authors | Peng Gao, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven C. H. Hoi, Xiaogang Wang, Hongsheng Li |
Abstract | Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method. |
Tasks | Question Answering, Visual Question Answering |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Gao_Dynamic_Fusion_With_Intra-_and_Inter-Modality_Attention_Flow_for_Visual_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Gao_Dynamic_Fusion_With_Intra-_and_Inter-Modality_Attention_Flow_for_Visual_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-fusion-with-intra-and-inter-modality-1 |
Repo | https://github.com/bupt-cist/DFAF-for-VQA.pytorch |
Framework | pytorch |
RedTyp: A Database of Reduplication with Computational Models
Title | RedTyp: A Database of Reduplication with Computational Models |
Authors | Hossep Dolatian, Jeffrey Heinz |
Abstract | |
Tasks | |
Published | 2019-01-01 |
URL | https://www.aclweb.org/anthology/W19-0102/ |
https://www.aclweb.org/anthology/W19-0102 | |
PWC | https://paperswithcode.com/paper/redtyp-a-database-of-reduplication-with |
Repo | https://github.com/jhdeov/RedTyp |
Framework | none |
An Encoding Strategy Based Word-Character LSTM for Chinese NER
Title | An Encoding Strategy Based Word-Character LSTM for Chinese NER |
Authors | Wei Liu, Tongge Xu, Qinghua Xu, Jiayu Song, Yueran Zu |
Abstract | A recently proposed lattice model has demonstrated that words in character sequence can provide rich word boundary information for character-based Chinese NER model. In this model, word information is integrated into a shortcut path between the start and the end characters of the word. However, the existence of shortcut path may cause the model to degenerate into a partial word-based model, which will suffer from word segmentation errors. Furthermore, the lattice model can not be trained in batches due to its DAG structure. In this paper, we propose a novel word-character LSTM(WC-LSTM) model to add word information into the start or the end character of the word, alleviating the influence of word segmentation errors while obtaining the word boundary information. Four different strategies are explored in our model to encode word information into a fixed-sized representation for efficient batch training. Experiments on benchmark datasets show that our proposed model outperforms other state-of-the-arts models. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1247/ |
https://www.aclweb.org/anthology/N19-1247 | |
PWC | https://paperswithcode.com/paper/an-encoding-strategy-based-word-character |
Repo | https://github.com/liuwei1206/CCW-NER |
Framework | pytorch |
Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations
Title | Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations |
Authors | Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma |
Abstract | Most previous fusion strategies either fail to fully utilize temporal information or cost too much time, and how to effectively fuse temporal information from consecutive frames plays an important role in video super-resolution (SR). In this study, we propose a novel progressive fusion network for video SR, which is designed to make better use of spatio-temporal information and is proved to be more efficient and effective than the existing direct fusion, slow fusion or 3D convolution strategies. Under this progressive fusion framework, we further introduce an improved non-local operation to avoid the complex motion estimation and motion compensation (ME&MC) procedures as in previous video SR approaches. Extensive experiments on public datasets demonstrate that our method surpasses state-of-the-art with 0.96 dB in average, and runs about 3 times faster, while requires only about half of the parameters. |
Tasks | Motion Compensation, Motion Estimation, Super-Resolution, Video Super-Resolution |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Yi_Progressive_Fusion_Video_Super-Resolution_Network_via_Exploiting_Non-Local_Spatio-Temporal_Correlations_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Yi_Progressive_Fusion_Video_Super-Resolution_Network_via_Exploiting_Non-Local_Spatio-Temporal_Correlations_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/progressive-fusion-video-super-resolution |
Repo | https://github.com/psychopa4/PFNL |
Framework | tf |
Attentive Feedback Network for Boundary-Aware Salient Object Detection
Title | Attentive Feedback Network for Boundary-Aware Salient Object Detection |
Authors | Mengyang Feng, Huchuan Lu, Errui Ding |
Abstract | Recent deep learning based salient object detection methods achieve gratifying performance built upon Fully Convolutional Neural Networks (FCNs). However, most of them have suffered from the boundary challenge. The state-of-the-art methods employ feature aggregation tech- nique and can precisely find out wherein the salient object, but they often fail to segment out the entire object with fine boundaries, especially those raised narrow stripes. So there is still a large room for improvement over the FCN based models. In this paper, we design the Attentive Feedback Modules (AFMs) to better explore the structure of objects. A Boundary-Enhanced Loss (BEL) is further employed for learning exquisite boundaries. Our proposed deep model produces satisfying results on the object boundaries and achieves state-of-the-art performance on five widely tested salient object detection benchmarks. The network is in a fully convolutional fashion running at a speed of 26 FPS and does not need any post-processing. |
Tasks | Object Detection, Salient Object Detection |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Feng_Attentive_Feedback_Network_for_Boundary-Aware_Salient_Object_Detection_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Feng_Attentive_Feedback_Network_for_Boundary-Aware_Salient_Object_Detection_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/attentive-feedback-network-for-boundary-aware |
Repo | https://github.com/ArcherFMY/AFNet |
Framework | none |
Superset Technique for Approximate Recovery in One-Bit Compressed Sensing
Title | Superset Technique for Approximate Recovery in One-Bit Compressed Sensing |
Authors | Larkin Flodin, Venkata Gandikota, Arya Mazumdar |
Abstract | One-bit compressed sensing (1bCS) is a method of signal acquisition under extreme measurement quantization that gives important insights on the limits of signal compression and analog-to-digital conversion. The setting is also equivalent to the problem of learning a sparse hyperplane-classifier. In this paper, we propose a generic approach for signal recovery in nonadaptive 1bCS that leads to improved sample complexity for approximate recovery for a variety of signal models, including nonnegative signals and binary signals. We construct 1bCS matrices that are universal - i.e. work for all signals under a model - and at the same time recover very general random sparse signals with high probability. In our approach, we divide the set of samples (measurements) into two parts, and use the first part to recover the superset of the support of a sparse vector. The second set of measurements is then used to approximate the signal within the superset. While support recovery in 1bCS is well-studied, recovery of superset of the support requires fewer samples, which then leads to an overall reduction in sample complexity for approximate recovery. |
Tasks | Quantization |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9226-superset-technique-for-approximate-recovery-in-one-bit-compressed-sensing |
http://papers.nips.cc/paper/9226-superset-technique-for-approximate-recovery-in-one-bit-compressed-sensing.pdf | |
PWC | https://paperswithcode.com/paper/superset-technique-for-approximate-recovery |
Repo | https://github.com/flodinl/neurips-1bCS |
Framework | none |
Learning Neural Representations for Network Anomaly Detection
Title | Learning Neural Representations for Network Anomaly Detection |
Authors | Van Loi Cao, Miguel Nicolau, and James McDermott |
Abstract | This paper proposes latent representation models for improving network anomaly detection. Well-known anomaly detection algorithms often suffer from challenges posed by network data, such as high dimension and sparsity, and a lack of anomaly data for training, model selection, and hyperparameter tuning. Our approach is to introduce new regularizers to a classical autoencoder (AE) and a variational AE, which force normal data into a very tight area centered at the origin in the nonsaturating area of the bottleneck unit activations. These trained AEs on normal data will push normal points toward the origin, whereas anomalies, which differ from normal data, will be put far away from the normal region. The models are very different from common regularized AEs, sparse AE, and contractive AE,in which the regularized AEs tend to make their latent representation less sensitive to changes of the input data. The bottleneck feature space is now used as a new data representation. A number of one-class learning algorithms are used for evaluating the proposed models. The experiments testify that our models help these classifiers to perform efficiently and consistently on highdimensional and sparse network datasets, even with relatively few training points. More importantly, the models can minimize the effect of model selection on these classifiers since their performance is insensitive to a wide range of hyperparameter settings. |
Tasks | Anomaly Detection, Intrusion Detection, Model Selection, Unsupervised Anomaly Detection |
Published | 2019-08-01 |
URL | https://ieeexplore.ieee.org/abstract/document/8386786 |
https://ieeexplore.ieee.org/abstract/document/8386786 | |
PWC | https://paperswithcode.com/paper/learning-neural-representations-for-network |
Repo | https://github.com/vanloicao/SAEDVAE |
Framework | tf |
DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction
Title | DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction |
Authors | Xiaoxing Zeng, Xiaojiang Peng, Yu Qiao |
Abstract | Reconstructing the detailed geometric structure from a single face image is a challenging problem due to its ill-posed nature and the fine 3D structures to be recovered. This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem. DF2Net decomposes the reconstruction process into three stages, each of which is processed by an elaborately-designed network, namely D-Net, F-Net, and Fr-Net. D-Net exploits a U-net architecture to map the input image to a dense depth image. F-Net refines the output of D-Net by integrating features from depth and RGB domains, whose output is further enhanced by Fr-Net with a novel multi-resolution hypercolumn architecture. In addition, we introduce three types of data to train these networks, including 3D model synthetic data, 2D image reconstructed data, and fine facial images. We elaborately exploit different datasets (or combination) together with well-designed losses to train different networks. Qualitative evaluation indicates that our DF2Net can effectively reconstruct subtle facial details such as small crow’s feet and wrinkles. Our DF2Net achieves performance superior or comparable to state-of-the-art algorithms in qualitative and quantitative analyses on real-world images and the BU-3DFE dataset. Code and the collected 70K image-depth data will be publicly available. |
Tasks | 3D Face Reconstruction, Face Reconstruction |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Zeng_DF2Net_A_Dense-Fine-Finer_Network_for_Detailed_3D_Face_Reconstruction_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Zeng_DF2Net_A_Dense-Fine-Finer_Network_for_Detailed_3D_Face_Reconstruction_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/df2net-a-dense-fine-finer-network-for |
Repo | https://github.com/xiaoxingzeng/DF2Net |
Framework | pytorch |
Bidirectional Transition-Based Dependency Parsing
Title | Bidirectional Transition-Based Dependency Parsing |
Authors | Yunzhe Yuan, Yong Jiang, Kewei Tu |
Abstract | Transition-based dependency parsing is a fast and effective approach for dependency parsing. Traditionally, a transitionbased dependency parser processes an input sentence and predicts a sequence of parsing actions in a left-to-right manner. During this process, an early prediction error may negatively impact the prediction of subsequent actions. In this paper, we propose a simple framework for bidirectional transitionbased parsing. During training, we learn a left-to-right parser and a right-to-left parser separately. To parse a sentence, we perform joint decoding with the two parsers. We propose three joint decoding algorithms that are based on joint scoring, dual decomposition, and dynamic oracle respectively. Empirical results show that our methods lead to competitive parsing accuracy and our method based on dynamic oracle consistently achieves the best performance. |
Tasks | Dependency Parsing, Transition-Based Dependency Parsing |
Published | 2019-07-17 |
URL | https://aaai.org/ojs/index.php/AAAI/article/view/4733 |
https://aaai.org/ojs/index.php/AAAI/article/view/4733/4611 | |
PWC | https://paperswithcode.com/paper/bidirectional-transition-based-dependency |
Repo | https://github.com/yuanyunzhe/bi-trans-parser |
Framework | none |
Reconciling λ-Returns with Experience Replay
Title | Reconciling λ-Returns with Experience Replay |
Authors | Brett Daley, Christopher Amato |
Abstract | Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values. Computation is not wasted on experiences that are never sampled, and stored λ-returns behave as stable temporal-difference (TD) targets that replace the target network. Additionally, our method grants the unique ability to observe TD errors prior to sampling; for the first time, transitions can be prioritized by their true significance rather than by a proxy to it. Furthermore, we propose the novel use of the TD error to dynamically select λ-values that facilitate faster learning. We show that these innovations can enhance the performance of DQN when playing Atari 2600 games, even under partial observability. While our work specifically focuses on λ-returns, these ideas are applicable to any multi-step return estimator. |
Tasks | Atari Games |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8397-reconciling-returns-with-experience-replay |
http://papers.nips.cc/paper/8397-reconciling-returns-with-experience-replay.pdf | |
PWC | https://paperswithcode.com/paper/reconciling-returns-with-experience-replay |
Repo | https://github.com/brett-daley/dqn-lambda |
Framework | tf |
Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo
Title | Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo |
Authors | Stepan Tulyakov, Francois Fleuret, Martin Kiefel, Peter Gehler, Michael Hirsch |
Abstract | Today, a frame-based camera is the sensor of choice for machine vision applications. However, these cameras, originally developed for acquisition of static images rather than for sensing of dynamic uncontrolled visual environments, suffer from high power consumption, data rate, latency and low dynamic range. An event-based image sensor addresses these drawbacks by mimicking a biological retina. Instead of measuring the intensity of every pixel in a fixed time-interval, it reports events of significant pixel intensity changes. Every such event is represented by its position, sign of change, and timestamp, accurate to the microsecond. Asynchronous event sequences require special handling, since traditional algorithms work only with synchronous, spatially gridded data. To address this problem we introduce a new module for event sequence embedding, for use in difference applications. The module builds a representation of an event sequence by firstly aggregating information locally across time, using a novel fully-connected layer for an irregularly sampled continuous domain, and then across discrete spatial domain. Based on this module, we design a deep learning-based stereo method for event-based cameras. The proposed method is the first learning-based stereo method for an event-based camera and the only method that produces dense results. We show that large performance increases on the Multi Vehicle Stereo Event Camera Dataset (MVSEC), which became the standard set for benchmarking of event-based stereo methods. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Tulyakov_Learning_an_Event_Sequence_Embedding_for_Dense_Event-Based_Deep_Stereo_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Tulyakov_Learning_an_Event_Sequence_Embedding_for_Dense_Event-Based_Deep_Stereo_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-an-event-sequence-embedding-for |
Repo | https://github.com/tlkvstepan/event_stereo_ICCV2019 |
Framework | pytorch |
Offline Contextual Bandits with High Probability Fairness Guarantees
Title | Offline Contextual Bandits with High Probability Fairness Guarantees |
Authors | Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas |
Abstract | We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-specified threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness definitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness definitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other offline and online contextual bandit algorithms. |
Tasks | Multi-Armed Bandits |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9630-offline-contextual-bandits-with-high-probability-fairness-guarantees |
http://papers.nips.cc/paper/9630-offline-contextual-bandits-with-high-probability-fairness-guarantees.pdf | |
PWC | https://paperswithcode.com/paper/offline-contextual-bandits-with-high |
Repo | https://github.com/sgiguere/RobinHood-NeurIPS-2019 |
Framework | none |
BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos
Title | BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos |
Authors | Eleanor Batty, Matthew Whiteway, Shreya Saxena, Dan Biderman, Taiga Abe, Simon Musall, Winthrop Gillis, Jeffrey Markowitz, Anne Churchland, John P. Cunningham, Sandeep R. Datta, Scott Linderman, Liam Paninski |
Abstract | A fundamental goal of systems neuroscience is to understand the relationship between neural activity and behavior. Behavior has traditionally been characterized by low-dimensional, task-related variables such as movement speed or response times. More recently, there has been a growing interest in automated analysis of high-dimensional video data collected during experiments. Here we introduce a probabilistic framework for the analysis of behavioral video and neural activity. This framework provides tools for compression, segmentation, generation, and decoding of behavioral videos. Compression is performed using a convolutional autoencoder (CAE), which yields a low-dimensional continuous representation of behavior. We then use an autoregressive hidden Markov model (ARHMM) to segment the CAE representation into discrete “behavioral syllables.” The resulting generative model can be used to simulate behavioral video data. Finally, based on this generative model, we develop a novel Bayesian decoding approach that takes in neural activity and outputs probabilistic estimates of the full-resolution behavioral video. We demonstrate this framework on two different experimental paradigms using distinct behavioral and neural recording technologies. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9701-behavenet-nonlinear-embedding-and-bayesian-neural-decoding-of-behavioral-videos |
http://papers.nips.cc/paper/9701-behavenet-nonlinear-embedding-and-bayesian-neural-decoding-of-behavioral-videos.pdf | |
PWC | https://paperswithcode.com/paper/behavenet-nonlinear-embedding-and-bayesian |
Repo | https://github.com/ebatty/behavenet |
Framework | pytorch |
Graph-based Dependency Parsing with Graph Neural Networks
Title | Graph-based Dependency Parsing with Graph Neural Networks |
Authors | Tao Ji, Yuanbin Wu, Man Lan |
Abstract | We investigate the problem of efficiently incorporating high-order features into neural graph-based dependency parsing. Instead of explicitly extracting high-order features from intermediate parse trees, we develop a more powerful dependency tree node representation which captures high-order information concisely and efficiently. We use graph neural networks (GNNs) to learn the representations and discuss several new configurations of GNN{'}s updating and aggregation functions. Experiments on PTB show that our parser achieves the best UAS and LAS on PTB (96.0{%}, 94.3{%}) among systems without using any external resources. |
Tasks | Dependency Parsing |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1237/ |
https://www.aclweb.org/anthology/P19-1237 | |
PWC | https://paperswithcode.com/paper/graph-based-dependency-parsing-with-graph |
Repo | https://github.com/AntNLP/gnn-dep-parsing |
Framework | none |
Molecule Property Prediction Based on Spatial Graph Embedding
Title | Molecule Property Prediction Based on Spatial Graph Embedding |
Authors | Xiaofeng Wang, Zhen Li, Mingjian Jiang, Shuang Wang, Shugang Zhang, Zhiqiang Wei |
Abstract | Accurate prediction of molecular properties is important for new compound design, which is a crucial step in drug discovery. In this paper, molecular graph data is utilized for property prediction based on graph convolution neural networks. In addition, a convolution spatial graph embedding layer (C-SGEL) is introduced to retain the spatial connection information on molecules. And, multiple C-SGELs are stacked to construct a convolution spatial graph embedding network (C-SGEN) for end-to-end representation learning. In order to enhance the robustness of the network, molecular fingerprints are also combined with C-SGEN to build a composite model for predicting molecular properties. Our comparative experiments have shown that our method is accurate and achieves the best results on some open benchmark datasets. |
Tasks | Drug Discovery, Graph Embedding, Graph Regression, Representation Learning |
Published | 2019-08-22 |
URL | https://doi.org/10.1021/acs.jcim.9b00410 |
https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.9b00410?rand=oin4mnup | |
PWC | https://paperswithcode.com/paper/molecule-property-prediction-based-on-spatial |
Repo | https://github.com/wxfsd/C-SGEN |
Framework | pytorch |