Paper Group ANR 25
Likelihood-Free Overcomplete ICA and Applications in Causal Discovery. ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection. Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications. IPO: Interior-point Policy Optimization under Constraints. A Question Answer …
Likelihood-Free Overcomplete ICA and Applications in Causal Discovery
Title | Likelihood-Free Overcomplete ICA and Applications in Causal Discovery |
Authors | Chenwei Ding, Mingming Gong, Kun Zhang, Dacheng Tao |
Abstract | Causal discovery witnessed significant progress over the past decades. In particular, many recent causal discovery methods make use of independent, non-Gaussian noise to achieve identifiability of the causal models. Existence of hidden direct common causes, or confounders, generally makes causal discovery more difficult; whenever they are present, the corresponding causal discovery algorithms can be seen as extensions of overcomplete independent component analysis (OICA). However, existing OICA algorithms usually make strong parametric assumptions on the distribution of independent components, which may be violated on real data, leading to sub-optimal or even wrong solutions. In addition, existing OICA algorithms rely on the Expectation Maximization (EM) procedure that requires computationally expensive inference of the posterior distribution of independent components. To tackle these problems, we present a Likelihood-Free Overcomplete ICA algorithm (LFOICA) that estimates the mixing matrix directly by back-propagation without any explicit assumptions on the density function of independent components. Thanks to its computational efficiency, the proposed method makes a number of causal discovery procedures much more practically feasible. For illustrative purposes, we demonstrate the computational efficiency and efficacy of our method in two causal discovery tasks on both synthetic and real data. |
Tasks | Causal Discovery |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01525v2 |
https://arxiv.org/pdf/1909.01525v2.pdf | |
PWC | https://paperswithcode.com/paper/likelihood-free-overcomplete-ica-and |
Repo | |
Framework | |
ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection
Title | ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection |
Authors | Jin Hyeok Yoo, Dongsuk Kum, Jun Won Choi |
Abstract | Convolutional neural network (CNN) has led to significant progress in object detection. In order to detect the objects in various sizes, the object detectors often exploit the hierarchy of the multi-scale feature maps called feature pyramid, which is readily obtained by the CNN architecture. However, the performance of these object detectors is limited since the bottom-level feature maps, which experience fewer convolutional layers, lack the semantic information needed to capture the characteristics of the small objects. In order to address such problem, various methods have been proposed to increase the depth for the bottom-level features used for object detection. While most approaches are based on the generation of additional features through the top-down pathway with lateral connections, our approach directly fuses multi-scale feature maps using bidirectional long short term memory (biLSTM) in effort to generate deeply fused semantics. Then, the resulting semantic information is redistributed to the individual pyramidal feature at each scale through the channel-wise attention model. We integrate our semantic combining and attentive redistribution feature network (ScarfNet) with baseline object detectors, i.e., Faster R-CNN, single-shot multibox detector (SSD) and RetinaNet. Our experiments show that our method outperforms the existing feature pyramid methods as well as the baseline detectors and achieve the state of the art performances in the PASCAL VOC and COCO detection benchmarks. |
Tasks | Object Detection |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00328v2 |
https://arxiv.org/pdf/1908.00328v2.pdf | |
PWC | https://paperswithcode.com/paper/scarfnet-multi-scale-features-with-deeply |
Repo | |
Framework | |
Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications
Title | Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications |
Authors | Chenglong Wang, Rudy Bunel, Krishnamurthy Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli |
Abstract | Models such as Sequence-to-Sequence and Image-to-Sequence are widely used in real world applications. While the ability of these neural architectures to produce variable-length outputs makes them extremely effective for problems like Machine Translation and Image Captioning, it also leaves them vulnerable to failures of the form where the model produces outputs of undesirable length. This behavior can have severe consequences such as usage of increased computation and induce faults in downstream modules that expect outputs of a certain length. Motivated by the need to have a better understanding of the failures of these models, this paper proposes and studies the novel output-size modulation problem and makes two key technical contributions. First, to evaluate model robustness, we develop an easy-to-compute differentiable proxy objective that can be used with gradient-based algorithms to find output-lengthening inputs. Second and more importantly, we develop a verification approach that can formally verify whether a network always produces outputs within a certain length. Experimental results on Machine Translation and Image Captioning show that our output-lengthening approach can produce outputs that are 50 times longer than the input, while our verification approach can, given a model and input domain, prove that the output length is below a certain size. |
Tasks | Image Captioning, Machine Translation |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.12004v1 |
http://arxiv.org/pdf/1904.12004v1.pdf | |
PWC | https://paperswithcode.com/paper/knowing-when-to-stop-evaluation-and |
Repo | |
Framework | |
IPO: Interior-point Policy Optimization under Constraints
Title | IPO: Interior-point Policy Optimization under Constraints |
Authors | Yongshuai Liu, Jiaxin Ding, Xin Liu |
Abstract | In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multiconstraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09615v1 |
https://arxiv.org/pdf/1910.09615v1.pdf | |
PWC | https://paperswithcode.com/paper/ipo-interior-point-policy-optimization-under |
Repo | |
Framework | |
A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base
Title | A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base |
Authors | Wahyudi, Masayu Leylia Khodra, Ary Setijadi Prihatmanto, Carmadi Machbub |
Abstract | A question answering system (QA System) was developed that uses graph-pattern association rules on the YAGO knowledge base. The answer as output of the system is provided based on a user question as input. If the answer is missing or unavailable in the database, then graph-pattern association rules are used to get the answer. The architecture of this question answering system is as follows: question classification, graph component generation, query generation, and query processing. The question answering system uses association graph patterns in a waterfall model. In this paper, the architecture of the system is described, specifically discussing its reasoning and performance capabilities. The results of this research is that rules with high confidence and correct logic produce correct answers, and vice versa |
Tasks | Question Answering |
Published | 2019-02-02 |
URL | http://arxiv.org/abs/1902.00624v1 |
http://arxiv.org/pdf/1902.00624v1.pdf | |
PWC | https://paperswithcode.com/paper/a-question-answering-system-using-graph |
Repo | |
Framework | |
Multi-Modal Three-Stream Network for Action Recognition
Title | Multi-Modal Three-Stream Network for Action Recognition |
Authors | Muhammad Usman Khalid, Jie Yu |
Abstract | Human action recognition in video is an active yet challenging research topic due to high variation and complexity of data. In this paper, a novel video based action recognition framework utilizing complementary cues is proposed to handle this complex problem. Inspired by the successful two stream networks for action classification, additional pose features are studied and fused to enhance understanding of human action in a more abstract and semantic way. Towards practices, not only ground truth poses but also noisy estimated poses are incorporated in the framework with our proposed pre-processing module. The whole framework and each cue are evaluated on varied benchmarking datasets as JHMDB, sub-JHMDB and Penn Action. Our results outperform state-of-the-art performance on these datasets and show the strength of complementary cues. |
Tasks | Action Classification, Temporal Action Localization |
Published | 2019-09-08 |
URL | https://arxiv.org/abs/1909.03466v1 |
https://arxiv.org/pdf/1909.03466v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-three-stream-network-for-action |
Repo | |
Framework | |
Second Order Value Iteration in Reinforcement Learning
Title | Second Order Value Iteration in Reinforcement Learning |
Authors | Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar |
Abstract | Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first order method and therefore it may take a large number of iterations to converge to the optimal solution. In this work, we propose a novel second order value iteration procedure based on the Newton-Raphson method. We first construct a modified contraction operator and then apply Newton-Raphson method to arrive at our algorithm. We prove the global convergence of our algorithm to the optimal solution and show the second order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach. |
Tasks | |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.03927v1 |
https://arxiv.org/pdf/1905.03927v1.pdf | |
PWC | https://paperswithcode.com/paper/second-order-value-iteration-in-reinforcement |
Repo | |
Framework | |
A Multi-Modal Feature Embedding Approach to Diagnose Alzheimer Disease from Spoken Language
Title | A Multi-Modal Feature Embedding Approach to Diagnose Alzheimer Disease from Spoken Language |
Authors | S. Soroush Haj Zargarbashi, Bagher Babaali |
Abstract | Introduction: Alzheimer’s disease is a type of dementia in which early diagnosis plays a major rule in the quality of treatment. Among new works in the diagnosis of Alzheimer’s disease, there are many of them analyzing the voice stream acoustically, syntactically or both. The mostly used tools to perform these analysis usually include machine learning techniques. Objective: Designing an automatic machine learning based diagnosis system will help in the procedure of early detection. Also, systems, using noninvasive data are preferable. Methods: We used are classification system based on spoken language. We use three (statistical and neural) approaches to classify audio signals from spoken language into two classes of dementia and control. Result: This work designs a multi-modal feature embedding on the spoken language audio signal using three approaches; N-gram, i-vector, and x-vector. The evaluation of the system is done on the cookie picture description task from Pitt Corpus dementia bank with the accuracy of 83:6 |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00330v1 |
https://arxiv.org/pdf/1910.00330v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-modal-feature-embedding-approach-to |
Repo | |
Framework | |
Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation
Title | Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation |
Authors | Thinh T. Doan |
Abstract | Motivated by their broad applications in reinforcement learning, we study the linear two-time-scale stochastic approximation, an iterative method using two different step sizes for finding the solutions of a system of two equations. Our main focus is to characterize the finite-time complexity of this method under time-varying step sizes and Markovian noise. In particular, we show that the mean square errors of the variables generated by the method converge to zero at a sublinear rate $\Ocal(k^{2/3})$, where $k$ is the number of iterations. We then improve the performance of this method by considering the restarting scheme, where we restart the algorithm after every predetermined number of iterations. We show that using this restarting method the complexity of the algorithm under time-varying step sizes is as good as the one using constant step sizes, but still achieving an exact converge to the desired solution. Moreover, the restarting scheme also helps to prevent the step sizes from getting too small, which is useful for the practical implementation of the linear two-time-scale stochastic approximation. |
Tasks | |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10583v2 |
https://arxiv.org/pdf/1912.10583v2.pdf | |
PWC | https://paperswithcode.com/paper/finite-time-analysis-and-restarting-scheme |
Repo | |
Framework | |
Neural News Recommendation with Attentive Multi-View Learning
Title | Neural News Recommendation with Attentive Multi-View Learning |
Authors | Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, Xing Xie |
Abstract | Personalized news recommendation is very important for online news platforms to help users find interested news and improve user experience. News and user representation learning is critical for news recommendation. Existing news recommendation methods usually learn these representations based on single news information, e.g., title, which may be insufficient. In this paper we propose a neural news recommendation approach which can learn informative representations of users and news by exploiting different kinds of news information. The core of our approach is a news encoder and a user encoder. In the news encoder we propose an attentive multi-view learning model to learn unified news representations from titles, bodies and topic categories by regarding them as different views of news. In addition, we apply both word-level and view-level attention mechanism to news encoder to select important words and views for learning informative news representations. In the user encoder we learn the representations of users based on their browsed news and apply attention mechanism to select informative news for user representation learning. Extensive experiments on a real-world dataset show our approach can effectively improve the performance of news recommendation. |
Tasks | MULTI-VIEW LEARNING, Representation Learning |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05576v1 |
https://arxiv.org/pdf/1907.05576v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-news-recommendation-with-attentive |
Repo | |
Framework | |
Attention: A Big Surprise for Cross-Domain Person Re-Identification
Title | Attention: A Big Surprise for Cross-Domain Person Re-Identification |
Authors | Haijun Liu, Jian Cheng, Shiguang Wang, Wen Wang |
Abstract | In this paper, we focus on model generalization and adaptation for cross-domain person re-identification (Re-ID). Unlike existing cross-domain Re-ID methods, leveraging the auxiliary information of those unlabeled target-domain data, we aim at enhancing the model generalization and adaptation by discriminative feature learning, and directly exploiting a pre-trained model to new domains (datasets) without any utilization of the information from target domains. To address the discriminative feature learning problem, we surprisingly find that simply introducing the attention mechanism to adaptively extract the person features for every domain is of great effectiveness. We adopt two popular type of attention mechanisms, long-range dependency based attention and direct generation based attention. Both of them can perform the attention via spatial or channel dimensions alone, even the combination of spatial and channel dimensions. The outline of different attentions are well illustrated. Moreover, we also incorporate the attention results into the final output of model through skip-connection to improve the features with both high and middle level semantic visual information. In the manner of directly exploiting a pre-trained model to new domains, the attention incorporation method truly could enhance the model generalization and adaptation to perform the cross-domain person Re-ID. We conduct extensive experiments between three large datasets, Market-1501, DukeMTMC-reID and MSMT17. Surprisingly, introducing only attention can achieve state-of-the-art performance, even much better than those cross-domain Re-ID methods utilizing auxiliary information from the target domain. |
Tasks | Person Re-Identification |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12830v1 |
https://arxiv.org/pdf/1905.12830v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-a-big-surprise-for-cross-domain |
Repo | |
Framework | |
Neuromorphic Architecture Optimization for Task-Specific Dynamic Learning
Title | Neuromorphic Architecture Optimization for Task-Specific Dynamic Learning |
Authors | Sandeep Madireddy, Angel Yanguas-Gil, Prasanna Balaprakash |
Abstract | The ability to learn and adapt in real time is a central feature of biological systems. Neuromorphic architectures demonstrating such versatility can greatly enhance our ability to efficiently process information at the edge. A key challenge, however, is to understand which learning rules are best suited for specific tasks and how the relevant hyperparameters can be fine-tuned. In this work, we introduce a conceptual framework in which the learning process is integrated into the network itself. This allows us to cast meta-learning as a mathematical optimization problem. We employ DeepHyper, a scalable, asynchronous model-based search, to simultaneously optimize the choice of meta-learning rules and their hyperparameters. We demonstrate our approach with two different datasets, MNIST and FashionMNIST, using a network architecture inspired by the learning center of the insect brain. Our results show that optimal learning rules can be dataset-dependent even within similar tasks. This dependency demonstrates the importance of introducing versatility and flexibility in the learning algorithms. It also illuminates experimental findings in insect neuroscience that have shown a heterogeneity of learning rules within the insect mushroom body. |
Tasks | Meta-Learning |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01668v1 |
https://arxiv.org/pdf/1906.01668v1.pdf | |
PWC | https://paperswithcode.com/paper/neuromorphic-architecture-optimization-for |
Repo | |
Framework | |
Multimodal Continuation-style Architectures for Human-Robot Interaction
Title | Multimodal Continuation-style Architectures for Human-Robot Interaction |
Authors | Nikhil Krishnaswamy, James Pustejovsky |
Abstract | We present an architecture for integrating real-time, multimodal input into a computational agent’s contextual model. Using a human-avatar interaction in a virtual world, we treat aligned gesture and speech as an ensemble where content may be communicated by either modality. With a modified nondeterministic pushdown automaton architecture, the computer system: (1) consumes input incrementally using continuation-passing style until it achieves sufficient understanding the user’s aim; (2) constructs and asks questions where necessary using established contextual information; and (3) maintains track of prior discourse items using multimodal cues. This type of architecture supports special cases of pushdown and finite state automata as well as integrating outputs from machine learning models. We present examples of this architecture’s use in multimodal one-shot learning interactions of novel gestures and live action composition. |
Tasks | One-Shot Learning |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08161v1 |
https://arxiv.org/pdf/1909.08161v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-continuation-style-architectures |
Repo | |
Framework | |
Video-based Person Re-identification with Two-stream Convolutional Network and Co-attentive Snippet Embedding
Title | Video-based Person Re-identification with Two-stream Convolutional Network and Co-attentive Snippet Embedding |
Authors | Peixian Chen, Pingyang Dai, Qiong Wu, Yuyu Huang |
Abstract | Recently, the applications of person re-identification in visual surveillance and human-computer interaction are sharply increasing, which signifies the critical role of such a problem. In this paper, we propose a two-stream convolutional network (ConvNet) based on the competitive similarity aggregation scheme and co-attentive embedding strategy for video-based person re-identification. By dividing the long video sequence into multiple short video snippets, we manage to utilize every snippet’s RGB frames, optical flow maps and pose maps to facilitate residual networks, e.g., ResNet, for feature extraction in the two-stream ConvNet. The extracted features are embedded by the co-attentive embedding method, which allows for the reduction of the effects of noisy frames. Finally, we fuse the outputs of both streams as the embedding of a snippet, and apply competitive snippet-similarity aggregation to measure the similarity between two sequences. Our experiments show that the proposed method significantly outperforms current state-of-the-art approaches on multiple datasets. |
Tasks | Optical Flow Estimation, Person Re-Identification, Video-Based Person Re-Identification |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11862v1 |
https://arxiv.org/pdf/1905.11862v1.pdf | |
PWC | https://paperswithcode.com/paper/video-based-person-re-identification-with-two |
Repo | |
Framework | |
Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets
Title | Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets |
Authors | Athar Sefid, Jian Wu, Allen C. Ge, Jing Zhao, Lu Liu, Cornelia Caragea, Prasenjit Mitra, C. Lee Giles |
Abstract | Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and heterogeneous, often containing incomplete fields and erroneous values. One common way of cleaning metadata is to use a bibliographic reference dataset. The challenge is to match records between corpora with high precision. The existing solution which is based on information retrieval and string similarity on titles works well only if the titles are cleaned. We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset. The blocking function uses the classic BM25 algorithm to find the matching candidates from the reference data that has been indexed by ElasticSearch. The core components use supervised methods which combine features extracted from all available metadata fields. The system also leverages available citation information to match entities. The combination of metadata and citation achieves high accuracy that significantly outperforms the baseline method on the same test dataset. We apply this system to match the database of CiteSeerX against Web of Science, PubMed, and DBLP. This method will be deployed in the CiteSeerX system to clean metadata and link records to other scholarly big datasets. |
Tasks | Information Retrieval |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08470v1 |
https://arxiv.org/pdf/1906.08470v1.pdf | |
PWC | https://paperswithcode.com/paper/cleaning-noisy-and-heterogeneous-metadata-for |
Repo | |
Framework | |