January 31, 2020

3146 words 15 mins read

Paper Group ANR 25

Likelihood-Free Overcomplete ICA and Applications in Causal Discovery. ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection. Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications. IPO: Interior-point Policy Optimization under Constraints. A Question Answer …

Likelihood-Free Overcomplete ICA and Applications in Causal Discovery


Title	Likelihood-Free Overcomplete ICA and Applications in Causal Discovery
Authors	Chenwei Ding, Mingming Gong, Kun Zhang, Dacheng Tao
Abstract	Causal discovery witnessed significant progress over the past decades. In particular, many recent causal discovery methods make use of independent, non-Gaussian noise to achieve identifiability of the causal models. Existence of hidden direct common causes, or confounders, generally makes causal discovery more difficult; whenever they are present, the corresponding causal discovery algorithms can be seen as extensions of overcomplete independent component analysis (OICA). However, existing OICA algorithms usually make strong parametric assumptions on the distribution of independent components, which may be violated on real data, leading to sub-optimal or even wrong solutions. In addition, existing OICA algorithms rely on the Expectation Maximization (EM) procedure that requires computationally expensive inference of the posterior distribution of independent components. To tackle these problems, we present a Likelihood-Free Overcomplete ICA algorithm (LFOICA) that estimates the mixing matrix directly by back-propagation without any explicit assumptions on the density function of independent components. Thanks to its computational efficiency, the proposed method makes a number of causal discovery procedures much more practically feasible. For illustrative purposes, we demonstrate the computational efficiency and efficacy of our method in two causal discovery tasks on both synthetic and real data.
Tasks	Causal Discovery
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01525v2
PDF	https://arxiv.org/pdf/1909.01525v2.pdf
PWC	https://paperswithcode.com/paper/likelihood-free-overcomplete-ica-and
Repo
Framework

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection


Title	ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection
Authors	Jin Hyeok Yoo, Dongsuk Kum, Jun Won Choi
Abstract	Convolutional neural network (CNN) has led to significant progress in object detection. In order to detect the objects in various sizes, the object detectors often exploit the hierarchy of the multi-scale feature maps called feature pyramid, which is readily obtained by the CNN architecture. However, the performance of these object detectors is limited since the bottom-level feature maps, which experience fewer convolutional layers, lack the semantic information needed to capture the characteristics of the small objects. In order to address such problem, various methods have been proposed to increase the depth for the bottom-level features used for object detection. While most approaches are based on the generation of additional features through the top-down pathway with lateral connections, our approach directly fuses multi-scale feature maps using bidirectional long short term memory (biLSTM) in effort to generate deeply fused semantics. Then, the resulting semantic information is redistributed to the individual pyramidal feature at each scale through the channel-wise attention model. We integrate our semantic combining and attentive redistribution feature network (ScarfNet) with baseline object detectors, i.e., Faster R-CNN, single-shot multibox detector (SSD) and RetinaNet. Our experiments show that our method outperforms the existing feature pyramid methods as well as the baseline detectors and achieve the state of the art performances in the PASCAL VOC and COCO detection benchmarks.
Tasks	Object Detection
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00328v2
PDF	https://arxiv.org/pdf/1908.00328v2.pdf
PWC	https://paperswithcode.com/paper/scarfnet-multi-scale-features-with-deeply
Repo
Framework

Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications


Title	Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications
Authors	Chenglong Wang, Rudy Bunel, Krishnamurthy Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli
Abstract	Models such as Sequence-to-Sequence and Image-to-Sequence are widely used in real world applications. While the ability of these neural architectures to produce variable-length outputs makes them extremely effective for problems like Machine Translation and Image Captioning, it also leaves them vulnerable to failures of the form where the model produces outputs of undesirable length. This behavior can have severe consequences such as usage of increased computation and induce faults in downstream modules that expect outputs of a certain length. Motivated by the need to have a better understanding of the failures of these models, this paper proposes and studies the novel output-size modulation problem and makes two key technical contributions. First, to evaluate model robustness, we develop an easy-to-compute differentiable proxy objective that can be used with gradient-based algorithms to find output-lengthening inputs. Second and more importantly, we develop a verification approach that can formally verify whether a network always produces outputs within a certain length. Experimental results on Machine Translation and Image Captioning show that our output-lengthening approach can produce outputs that are 50 times longer than the input, while our verification approach can, given a model and input domain, prove that the output length is below a certain size.
Tasks	Image Captioning, Machine Translation
Published	2019-04-26
URL	http://arxiv.org/abs/1904.12004v1
PDF	http://arxiv.org/pdf/1904.12004v1.pdf
PWC	https://paperswithcode.com/paper/knowing-when-to-stop-evaluation-and
Repo
Framework

IPO: Interior-point Policy Optimization under Constraints


Title	IPO: Interior-point Policy Optimization under Constraints
Authors	Yongshuai Liu, Jiaxin Ding, Xin Liu
Abstract	In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multiconstraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09615v1
PDF	https://arxiv.org/pdf/1910.09615v1.pdf
PWC	https://paperswithcode.com/paper/ipo-interior-point-policy-optimization-under
Repo
Framework

A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base


Title	A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base
Authors	Wahyudi, Masayu Leylia Khodra, Ary Setijadi Prihatmanto, Carmadi Machbub
Abstract	A question answering system (QA System) was developed that uses graph-pattern association rules on the YAGO knowledge base. The answer as output of the system is provided based on a user question as input. If the answer is missing or unavailable in the database, then graph-pattern association rules are used to get the answer. The architecture of this question answering system is as follows: question classification, graph component generation, query generation, and query processing. The question answering system uses association graph patterns in a waterfall model. In this paper, the architecture of the system is described, specifically discussing its reasoning and performance capabilities. The results of this research is that rules with high confidence and correct logic produce correct answers, and vice versa
Tasks	Question Answering
Published	2019-02-02
URL	http://arxiv.org/abs/1902.00624v1
PDF	http://arxiv.org/pdf/1902.00624v1.pdf
PWC	https://paperswithcode.com/paper/a-question-answering-system-using-graph
Repo
Framework


Title	Multi-Modal Three-Stream Network for Action Recognition
Authors	Muhammad Usman Khalid, Jie Yu
Abstract	Human action recognition in video is an active yet challenging research topic due to high variation and complexity of data. In this paper, a novel video based action recognition framework utilizing complementary cues is proposed to handle this complex problem. Inspired by the successful two stream networks for action classification, additional pose features are studied and fused to enhance understanding of human action in a more abstract and semantic way. Towards practices, not only ground truth poses but also noisy estimated poses are incorporated in the framework with our proposed pre-processing module. The whole framework and each cue are evaluated on varied benchmarking datasets as JHMDB, sub-JHMDB and Penn Action. Our results outperform state-of-the-art performance on these datasets and show the strength of complementary cues.
Tasks	Action Classification, Temporal Action Localization
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03466v1
PDF	https://arxiv.org/pdf/1909.03466v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-three-stream-network-for-action
Repo
Framework

Second Order Value Iteration in Reinforcement Learning


Title	Second Order Value Iteration in Reinforcement Learning
Authors	Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar
Abstract	Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first order method and therefore it may take a large number of iterations to converge to the optimal solution. In this work, we propose a novel second order value iteration procedure based on the Newton-Raphson method. We first construct a modified contraction operator and then apply Newton-Raphson method to arrive at our algorithm. We prove the global convergence of our algorithm to the optimal solution and show the second order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.
Tasks
Published	2019-05-10
URL	https://arxiv.org/abs/1905.03927v1
PDF	https://arxiv.org/pdf/1905.03927v1.pdf
PWC	https://paperswithcode.com/paper/second-order-value-iteration-in-reinforcement
Repo
Framework


Title	A Multi-Modal Feature Embedding Approach to Diagnose Alzheimer Disease from Spoken Language
Authors	S. Soroush Haj Zargarbashi, Bagher Babaali
Abstract	Introduction: Alzheimer’s disease is a type of dementia in which early diagnosis plays a major rule in the quality of treatment. Among new works in the diagnosis of Alzheimer’s disease, there are many of them analyzing the voice stream acoustically, syntactically or both. The mostly used tools to perform these analysis usually include machine learning techniques. Objective: Designing an automatic machine learning based diagnosis system will help in the procedure of early detection. Also, systems, using noninvasive data are preferable. Methods: We used are classification system based on spoken language. We use three (statistical and neural) approaches to classify audio signals from spoken language into two classes of dementia and control. Result: This work designs a multi-modal feature embedding on the spoken language audio signal using three approaches; N-gram, i-vector, and x-vector. The evaluation of the system is done on the cookie picture description task from Pitt Corpus dementia bank with the accuracy of 83:6
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00330v1
PDF	https://arxiv.org/pdf/1910.00330v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-modal-feature-embedding-approach-to
Repo
Framework

Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation


Title	Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation
Authors	Thinh T. Doan
Abstract	Motivated by their broad applications in reinforcement learning, we study the linear two-time-scale stochastic approximation, an iterative method using two different step sizes for finding the solutions of a system of two equations. Our main focus is to characterize the finite-time complexity of this method under time-varying step sizes and Markovian noise. In particular, we show that the mean square errors of the variables generated by the method converge to zero at a sublinear rate $\Ocal(k^{2/3})$, where $k$ is the number of iterations. We then improve the performance of this method by considering the restarting scheme, where we restart the algorithm after every predetermined number of iterations. We show that using this restarting method the complexity of the algorithm under time-varying step sizes is as good as the one using constant step sizes, but still achieving an exact converge to the desired solution. Moreover, the restarting scheme also helps to prevent the step sizes from getting too small, which is useful for the practical implementation of the linear two-time-scale stochastic approximation.
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10583v2
PDF	https://arxiv.org/pdf/1912.10583v2.pdf
PWC	https://paperswithcode.com/paper/finite-time-analysis-and-restarting-scheme
Repo
Framework

Neural News Recommendation with Attentive Multi-View Learning


Title	Neural News Recommendation with Attentive Multi-View Learning
Authors	Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, Xing Xie
Abstract	Personalized news recommendation is very important for online news platforms to help users find interested news and improve user experience. News and user representation learning is critical for news recommendation. Existing news recommendation methods usually learn these representations based on single news information, e.g., title, which may be insufficient. In this paper we propose a neural news recommendation approach which can learn informative representations of users and news by exploiting different kinds of news information. The core of our approach is a news encoder and a user encoder. In the news encoder we propose an attentive multi-view learning model to learn unified news representations from titles, bodies and topic categories by regarding them as different views of news. In addition, we apply both word-level and view-level attention mechanism to news encoder to select important words and views for learning informative news representations. In the user encoder we learn the representations of users based on their browsed news and apply attention mechanism to select informative news for user representation learning. Extensive experiments on a real-world dataset show our approach can effectively improve the performance of news recommendation.
Tasks	MULTI-VIEW LEARNING, Representation Learning
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05576v1
PDF	https://arxiv.org/pdf/1907.05576v1.pdf
PWC	https://paperswithcode.com/paper/neural-news-recommendation-with-attentive
Repo
Framework

Attention: A Big Surprise for Cross-Domain Person Re-Identification


Title	Attention: A Big Surprise for Cross-Domain Person Re-Identification
Authors	Haijun Liu, Jian Cheng, Shiguang Wang, Wen Wang
Abstract	In this paper, we focus on model generalization and adaptation for cross-domain person re-identification (Re-ID). Unlike existing cross-domain Re-ID methods, leveraging the auxiliary information of those unlabeled target-domain data, we aim at enhancing the model generalization and adaptation by discriminative feature learning, and directly exploiting a pre-trained model to new domains (datasets) without any utilization of the information from target domains. To address the discriminative feature learning problem, we surprisingly find that simply introducing the attention mechanism to adaptively extract the person features for every domain is of great effectiveness. We adopt two popular type of attention mechanisms, long-range dependency based attention and direct generation based attention. Both of them can perform the attention via spatial or channel dimensions alone, even the combination of spatial and channel dimensions. The outline of different attentions are well illustrated. Moreover, we also incorporate the attention results into the final output of model through skip-connection to improve the features with both high and middle level semantic visual information. In the manner of directly exploiting a pre-trained model to new domains, the attention incorporation method truly could enhance the model generalization and adaptation to perform the cross-domain person Re-ID. We conduct extensive experiments between three large datasets, Market-1501, DukeMTMC-reID and MSMT17. Surprisingly, introducing only attention can achieve state-of-the-art performance, even much better than those cross-domain Re-ID methods utilizing auxiliary information from the target domain.
Tasks	Person Re-Identification
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12830v1
PDF	https://arxiv.org/pdf/1905.12830v1.pdf
PWC	https://paperswithcode.com/paper/attention-a-big-surprise-for-cross-domain
Repo
Framework

Neuromorphic Architecture Optimization for Task-Specific Dynamic Learning


Title	Neuromorphic Architecture Optimization for Task-Specific Dynamic Learning
Authors	Sandeep Madireddy, Angel Yanguas-Gil, Prasanna Balaprakash
Abstract	The ability to learn and adapt in real time is a central feature of biological systems. Neuromorphic architectures demonstrating such versatility can greatly enhance our ability to efficiently process information at the edge. A key challenge, however, is to understand which learning rules are best suited for specific tasks and how the relevant hyperparameters can be fine-tuned. In this work, we introduce a conceptual framework in which the learning process is integrated into the network itself. This allows us to cast meta-learning as a mathematical optimization problem. We employ DeepHyper, a scalable, asynchronous model-based search, to simultaneously optimize the choice of meta-learning rules and their hyperparameters. We demonstrate our approach with two different datasets, MNIST and FashionMNIST, using a network architecture inspired by the learning center of the insect brain. Our results show that optimal learning rules can be dataset-dependent even within similar tasks. This dependency demonstrates the importance of introducing versatility and flexibility in the learning algorithms. It also illuminates experimental findings in insect neuroscience that have shown a heterogeneity of learning rules within the insect mushroom body.
Tasks	Meta-Learning
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01668v1
PDF	https://arxiv.org/pdf/1906.01668v1.pdf
PWC	https://paperswithcode.com/paper/neuromorphic-architecture-optimization-for
Repo
Framework

Multimodal Continuation-style Architectures for Human-Robot Interaction


Title	Multimodal Continuation-style Architectures for Human-Robot Interaction
Authors	Nikhil Krishnaswamy, James Pustejovsky
Abstract	We present an architecture for integrating real-time, multimodal input into a computational agent’s contextual model. Using a human-avatar interaction in a virtual world, we treat aligned gesture and speech as an ensemble where content may be communicated by either modality. With a modified nondeterministic pushdown automaton architecture, the computer system: (1) consumes input incrementally using continuation-passing style until it achieves sufficient understanding the user’s aim; (2) constructs and asks questions where necessary using established contextual information; and (3) maintains track of prior discourse items using multimodal cues. This type of architecture supports special cases of pushdown and finite state automata as well as integrating outputs from machine learning models. We present examples of this architecture’s use in multimodal one-shot learning interactions of novel gestures and live action composition.
Tasks	One-Shot Learning
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08161v1
PDF	https://arxiv.org/pdf/1909.08161v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-continuation-style-architectures
Repo
Framework

Video-based Person Re-identification with Two-stream Convolutional Network and Co-attentive Snippet Embedding


Title	Video-based Person Re-identification with Two-stream Convolutional Network and Co-attentive Snippet Embedding
Authors	Peixian Chen, Pingyang Dai, Qiong Wu, Yuyu Huang
Abstract	Recently, the applications of person re-identification in visual surveillance and human-computer interaction are sharply increasing, which signifies the critical role of such a problem. In this paper, we propose a two-stream convolutional network (ConvNet) based on the competitive similarity aggregation scheme and co-attentive embedding strategy for video-based person re-identification. By dividing the long video sequence into multiple short video snippets, we manage to utilize every snippet’s RGB frames, optical flow maps and pose maps to facilitate residual networks, e.g., ResNet, for feature extraction in the two-stream ConvNet. The extracted features are embedded by the co-attentive embedding method, which allows for the reduction of the effects of noisy frames. Finally, we fuse the outputs of both streams as the embedding of a snippet, and apply competitive snippet-similarity aggregation to measure the similarity between two sequences. Our experiments show that the proposed method significantly outperforms current state-of-the-art approaches on multiple datasets.
Tasks	Optical Flow Estimation, Person Re-Identification, Video-Based Person Re-Identification
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11862v1
PDF	https://arxiv.org/pdf/1905.11862v1.pdf
PWC	https://paperswithcode.com/paper/video-based-person-re-identification-with-two
Repo
Framework

Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets


Title	Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets
Authors	Athar Sefid, Jian Wu, Allen C. Ge, Jing Zhao, Lu Liu, Cornelia Caragea, Prasenjit Mitra, C. Lee Giles
Abstract	Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and heterogeneous, often containing incomplete fields and erroneous values. One common way of cleaning metadata is to use a bibliographic reference dataset. The challenge is to match records between corpora with high precision. The existing solution which is based on information retrieval and string similarity on titles works well only if the titles are cleaned. We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset. The blocking function uses the classic BM25 algorithm to find the matching candidates from the reference data that has been indexed by ElasticSearch. The core components use supervised methods which combine features extracted from all available metadata fields. The system also leverages available citation information to match entities. The combination of metadata and citation achieves high accuracy that significantly outperforms the baseline method on the same test dataset. We apply this system to match the database of CiteSeerX against Web of Science, PubMed, and DBLP. This method will be deployed in the CiteSeerX system to clean metadata and link records to other scholarly big datasets.
Tasks	Information Retrieval
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08470v1
PDF	https://arxiv.org/pdf/1906.08470v1.pdf
PWC	https://paperswithcode.com/paper/cleaning-noisy-and-heterogeneous-metadata-for
Repo
Framework