July 29, 2019

2837 words 14 mins read

Paper Group ANR 152

Video Salient Object Detection Using Spatiotemporal Deep Features. Discovering Visual Concept Structure with Sparse and Incomplete Tags. Multi-modal analysis of genetically-related subjects using SIFT descriptors in brain MRI. Mining Object Parts from CNNs via Active Question-Answering. Pre-training Attention Mechanisms. Learning to Detect Human-Ob …

Video Salient Object Detection Using Spatiotemporal Deep Features


Title	Video Salient Object Detection Using Spatiotemporal Deep Features
Authors	Trung-Nghia Le, Akihiro Sugimoto
Abstract	This paper presents a method for detecting salient objects in videos where temporal information in addition to spatial information is fully taken into account. Following recent reports on the advantage of deep features over conventional hand-crafted features, we propose a new set of SpatioTemporal Deep (STD) features that utilize local and global contexts over frames. We also propose new SpatioTemporal Conditional Random Field (STCRF) to compute saliency from STD features. STCRF is our extension of CRF to the temporal domain and describes the relationships among neighboring regions both in a frame and over frames. STCRF leads to temporally consistent saliency maps over frames, contributing to the accurate detection of salient objects’ boundaries and noise reduction during detection. Our proposed method first segments an input video into multiple scales and then computes a saliency map at each scale level using STD features with STCRF. The final saliency map is computed by fusing saliency maps at different scale levels. Our experiments, using publicly available benchmark datasets, confirm that the proposed method significantly outperforms state-of-the-art methods. We also applied our saliency computation to the video object segmentation task, showing that our method outperforms existing video object segmentation methods.
Tasks	Object Detection, Salient Object Detection, Semantic Segmentation, Video Object Segmentation, Video Salient Object Detection, Video Semantic Segmentation
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01447v3
PDF	http://arxiv.org/pdf/1708.01447v3.pdf
PWC	https://paperswithcode.com/paper/video-salient-object-detection-using
Repo
Framework

Discovering Visual Concept Structure with Sparse and Incomplete Tags


Title	Discovering Visual Concept Structure with Sparse and Incomplete Tags
Authors	Jingya Wang, Xiatian Zhu, Shaogang Gong
Abstract	Discovering automatically the semantic structure of tagged visual data (e.g. web videos and images) is important for visual data analysis and interpretation, enabling the machine intelligence for effectively processing the fast-growing amount of multi-media data. However, this is non-trivial due to the need for jointly learning underlying correlations between heterogeneous visual and tag data. The task is made more challenging by inherently sparse and incomplete tags. In this work, we develop a method for modelling the inherent visual data concept structures based on a novel Hierarchical-Multi-Label Random Forest model capable of correlating structured visual and tag information so as to more accurately interpret the visual semantics, e.g. disclosing meaningful visual groups with similar high-level concepts, and recovering missing tags for individual visual data samples. Specifically, our model exploits hierarchically structured tags of different semantic abstractness and multiple tag statistical correlations in addition to modelling visual and tag interactions. As a result, our model is able to discover more accurate semantic correlation between textual tags and visual features, and finally providing favourable visual semantics interpretation even with highly sparse and incomplete tags. We demonstrate the advantages of our proposed approach in two fundamental applications, visual data clustering and missing tag completion, on benchmarking video (i.e. TRECVID MED 2011) and image (i.e. NUS-WIDE) datasets.
Tasks
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10659v1
PDF	http://arxiv.org/pdf/1705.10659v1.pdf
PWC	https://paperswithcode.com/paper/discovering-visual-concept-structure-with
Repo
Framework


Title	Multi-modal analysis of genetically-related subjects using SIFT descriptors in brain MRI
Authors	Kuldeep Kumar, Laurent Chauvin, Mathew Toews, Olivier Colliot, Christian Desrosiers
Abstract	So far, fingerprinting studies have focused on identifying features from single-modality MRI data, which capture individual characteristics in terms of brain structure, function, or white matter microstructure. However, due to the lack of a framework for comparing across multiple modalities, studies based on multi-modal data remain elusive. This paper presents a multi-modal analysis of genetically-related subjects to compare and contrast the information provided by various MRI modalities. The proposed framework represents MRI scans as bags of SIFT features, and uses these features in a nearest-neighbor graph to measure subject similarity. Experiments using the T1/T2-weighted MRI and diffusion MRI data of 861 Human Connectome Project subjects demonstrate strong links between the proposed similarity measure and genetic proximity.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06151v1
PDF	http://arxiv.org/pdf/1709.06151v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-analysis-of-genetically-related
Repo
Framework

Mining Object Parts from CNNs via Active Question-Answering


Title	Mining Object Parts from CNNs via Active Question-Answering
Authors	Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu
Abstract	Given a convolutional neural network (CNN) that is pre-trained for object classification, this paper proposes to use active question-answering to semanticize neural patterns in conv-layers of the CNN and mine part concepts. For each part concept, we mine neural patterns in the pre-trained CNN, which are related to the target part, and use these patterns to construct an And-Or graph (AOG) to represent a four-layer semantic hierarchy of the part. As an interpretable model, the AOG associates different CNN units with different explicit object parts. We use an active human-computer communication to incrementally grow such an AOG on the pre-trained CNN as follows. We allow the computer to actively identify objects, whose neural patterns cannot be explained by the current AOG. Then, the computer asks human about the unexplained objects, and uses the answers to automatically discover certain CNN patterns corresponding to the missing knowledge. We incrementally grow the AOG to encode new knowledge discovered during the active-learning process. In experiments, our method exhibits high learning efficiency. Our method uses about 1/6-1/3 of the part annotations for training, but achieves similar or better part-localization performance than fast-RCNN methods.
Tasks	Active Learning, Object Classification, Question Answering
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03173v1
PDF	http://arxiv.org/pdf/1704.03173v1.pdf
PWC	https://paperswithcode.com/paper/mining-object-parts-from-cnns-via-active
Repo
Framework

Pre-training Attention Mechanisms


Title	Pre-training Attention Mechanisms
Authors	Jack Lindsey
Abstract	Recurrent neural networks with differentiable attention mechanisms have had success in generative and classification tasks. We show that the classification performance of such models can be enhanced by guiding a randomly initialized model to attend to salient regions of the input in early training iterations. We further show that, if explicit heuristics for guidance are unavailable, a model that is pretrained on an unsupervised reconstruction task can discover good attention policies without supervision. We demonstrate that increased efficiency of the attention mechanism itself contributes to these performance improvements. Based on these insights, we introduce bootstrapped glimpse mimicking, a simple, theoretically task-general method of more effectively training attention models. Our work draws inspiration from and parallels results on human learning of attention.
Tasks
Published	2017-12-15
URL	http://arxiv.org/abs/1712.05652v1
PDF	http://arxiv.org/pdf/1712.05652v1.pdf
PWC	https://paperswithcode.com/paper/pre-training-attention-mechanisms
Repo
Framework

Learning to Detect Human-Object Interactions


Title	Learning to Detect Human-Object Interactions
Authors	Yu-Wei Chao, Yunfan Liu, Xieyang Liu, Huayi Zeng, Jia Deng
Abstract	We study the problem of detecting human-object interactions (HOI) in static images, defined as predicting a human and an object bounding box with an interaction class label that connects them. HOI detection is a fundamental problem in computer vision as it provides semantic information about the interactions among the detected objects. We introduce HICO-DET, a new large benchmark for HOI detection, by augmenting the current HICO classification benchmark with instance annotations. To solve the task, we propose Human-Object Region-based Convolutional Neural Networks (HO-RCNN). At the core of our HO-RCNN is the Interaction Pattern, a novel DNN input that characterizes the spatial relations between two bounding boxes. Experiments on HICO-DET demonstrate that our HO-RCNN, by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches.
Tasks	Human-Object Interaction Detection
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05448v2
PDF	http://arxiv.org/pdf/1702.05448v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-detect-human-object-interactions
Repo
Framework

Analysis of Italian Word Embeddings


Title	Analysis of Italian Word Embeddings
Authors	Rocco Tripodi, Stefano Li Pira
Abstract	In this work we analyze the performances of two of the most used word embeddings algorithms, skip-gram and continuous bag of words on Italian language. These algorithms have many hyper-parameter that have to be carefully tuned in order to obtain accurate word representation in vectorial space. We provide an accurate analysis and an evaluation, showing what are the best configuration of parameters for specific tasks.
Tasks	Word Embeddings
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08783v4
PDF	http://arxiv.org/pdf/1707.08783v4.pdf
PWC	https://paperswithcode.com/paper/analysis-of-italian-word-embeddings
Repo
Framework

Low-Rank Hidden State Embeddings for Viterbi Sequence Labeling


Title	Low-Rank Hidden State Embeddings for Viterbi Sequence Labeling
Authors	Dung Thai, Shikhar Murty, Trapit Bansal, Luke Vilnis, David Belanger, Andrew McCallum
Abstract	In textual information extraction and other sequence labeling tasks it is now common to use recurrent neural networks (such as LSTM) to form rich embedded representations of long-term input co-occurrence patterns. Representation of output co-occurrence patterns is typically limited to a hand-designed graphical model, such as a linear-chain CRF representing short-term Markov dependencies among successive labels. This paper presents a method that learns embedded representations of latent output structure in sequence data. Our model takes the form of a finite-state machine with a large number of latent states per label (a latent variable CRF), where the state-transition matrix is factorized—effectively forming an embedded representation of state-transitions capable of enforcing long-term label dependencies, while supporting exact Viterbi inference over output labels. We demonstrate accuracy improvements and interpretable latent structure in a synthetic but complex task based on CoNLL named entity recognition.
Tasks	Named Entity Recognition
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00553v1
PDF	http://arxiv.org/pdf/1708.00553v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-hidden-state-embeddings-for-viterbi
Repo
Framework

Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer’s disease


Title	Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer’s disease
Authors	Wolfgang Fruehwirt, Matthias Gerstgrasser, Pengfei Zhang, Leonard Weydemann, Markus Waser, Reinhold Schmidt, Thomas Benke, Peter Dal-Bianco, Gerhard Ransmayr, Dieter Grossegger, Heinrich Garn, Gareth W. Peters, Stephen Roberts, Georg Dorffner
Abstract	The diagnosis of Alzheimer’s disease (AD) in routine clinical practice is most commonly based on subjective clinical interpretations. Quantitative electroencephalography (QEEG) measures have been shown to reflect neurodegenerative processes in AD and might qualify as affordable and thereby widely available markers to facilitate the objectivization of AD assessment. Here, we present a novel framework combining Riemannian tangent space mapping and elastic net regression for the development of brain atrophy markers. While most AD QEEG studies are based on small sample sizes and psychological test scores as outcome measures, here we train and test our models using data of one of the largest prospective EEG AD trials ever conducted, including MRI biomarkers of brain atrophy.
Tasks	EEG
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08359v1
PDF	http://arxiv.org/pdf/1711.08359v1.pdf
PWC	https://paperswithcode.com/paper/riemannian-tangent-space-mapping-and-elastic
Repo
Framework

Tweeting AI: Perceptions of AI-Tweeters (AIT) vs Expert AI-Tweeters (EAIT)


Title	Tweeting AI: Perceptions of AI-Tweeters (AIT) vs Expert AI-Tweeters (EAIT)
Authors	Lydia Manikonda, Cameron Dudley, Subbarao Kambhampati
Abstract	With the recent advancements in Artificial Intelligence (AI), various organizations and individuals started debating about the progress of AI as a blessing or a curse for the future of the society. This paper conducts an investigation on how the public perceives the progress of AI by utilizing the data shared on Twitter. Specifically, this paper performs a comparative analysis on the understanding of users from two categories – general AI-Tweeters (AIT) and the expert AI-Tweeters (EAIT) who share posts about AI on Twitter. Our analysis revealed that users from both the categories express distinct emotions and interests towards AI. Users from both the categories regard AI as positive and are optimistic about the progress of AI but the experts are more negative than the general AI-Tweeters. Characterization of users manifested that `London’ is the popular location of users from where they tweet about AI. Tweets posted by AIT are highly retweeted than posts made by EAIT that reveals greater diffusion of information from AIT. \|
Tasks
Published	2017-04-27
URL	http://arxiv.org/abs/1704.08389v2
PDF	http://arxiv.org/pdf/1704.08389v2.pdf
PWC	https://paperswithcode.com/paper/tweeting-ai-perceptions-of-ai-tweeters-ait-vs
Repo
Framework

SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological Inflection with Attentional Sequence-to-Sequence Models


Title	SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological Inflection with Attentional Sequence-to-Sequence Models
Authors	Robert Östling, Johannes Bjerva
Abstract	This paper describes the Stockholm University/University of Groningen (SU-RUG) system for the SIGMORPHON 2017 shared task on morphological inflection. Our system is based on an attentional sequence-to-sequence neural network model using Long Short-Term Memory (LSTM) cells, with joint training of morphological inflection and the inverse transformation, i.e. lemmatization and morphological analysis. Our system outperforms the baseline with a large margin, and our submission ranks as the 4th best team for the track we participate in (task 1, high-resource).
Tasks	Lemmatization, Morphological Analysis, Morphological Inflection
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03499v1
PDF	http://arxiv.org/pdf/1706.03499v1.pdf
PWC	https://paperswithcode.com/paper/su-rug-at-the-conll-sigmorphon-2017-shared
Repo
Framework

OMNIRank: Risk Quantification for P2P Platforms with Deep Learning


Title	OMNIRank: Risk Quantification for P2P Platforms with Deep Learning
Authors	Honglun Zhang, Haiyang Wang, Xiaming Chen, Yongkun Wang, Yaohui Jin
Abstract	P2P lending presents as an innovative and flexible alternative for conventional lending institutions like banks, where lenders and borrowers directly make transactions and benefit each other without complicated verifications. However, due to lack of specialized laws, delegated monitoring and effective managements, P2P platforms may spawn potential risks, such as withdraw failures, investigation involvements and even runaway bosses, which cause great losses to lenders and are especially serious and notorious in China. Although there are abundant public information and data available on the Internet related to P2P platforms, challenges of multi-sourcing and heterogeneity matter. In this paper, we promote a novel deep learning model, OMNIRank, which comprehends multi-dimensional features of P2P platforms for risk quantification and produces scores for ranking. We first construct a large-scale flexible crawling framework and obtain great amounts of multi-source heterogeneous data of domestic P2P platforms since 2007 from the Internet. Purifications like duplication and noise removal, null handing, format unification and fusion are applied to improve data qualities. Then we extract deep features of P2P platforms via text comprehension, topic modeling, knowledge graph and sentiment analysis, which are delivered as inputs to OMNIRank, a deep learning model for risk quantification of P2P platforms. Finally, according to rankings generated by OMNIRank, we conduct flourish data visualizations and interactions, providing lenders with comprehensive information supports, decision suggestions and safety guarantees.
Tasks	Reading Comprehension, Sentiment Analysis
Published	2017-04-27
URL	http://arxiv.org/abs/1705.03497v1
PDF	http://arxiv.org/pdf/1705.03497v1.pdf
PWC	https://paperswithcode.com/paper/omnirank-risk-quantification-for-p2p
Repo
Framework

Generic Tubelet Proposals for Action Localization


Title	Generic Tubelet Proposals for Action Localization
Authors	Jiawei He, Mostafa S. Ibrahim, Zhiwei Deng, Greg Mori
Abstract	We develop a novel framework for action localization in videos. We propose the Tube Proposal Network (TPN), which can generate generic, class-independent, video-level tubelet proposals in videos. The generated tubelet proposals can be utilized in various video analysis tasks, including recognizing and localizing actions in videos. In particular, we integrate these generic tubelet proposals into a unified temporal deep network for action classification. Compared with other methods, our generic tubelet proposal method is accurate, general, and is fully differentiable under a smoothL1 loss function. We demonstrate the performance of our algorithm on the standard UCF-Sports, J-HMDB21, and UCF-101 datasets. Our class-independent TPN outperforms other tubelet generation methods, and our unified temporal deep network achieves state-of-the-art localization results on all three datasets.
Tasks	Action Classification, Action Localization
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10861v1
PDF	http://arxiv.org/pdf/1705.10861v1.pdf
PWC	https://paperswithcode.com/paper/generic-tubelet-proposals-for-action
Repo
Framework

DeepDeath: Learning to Predict the Underlying Cause of Death with Big Data


Title	DeepDeath: Learning to Predict the Underlying Cause of Death with Big Data
Authors	Hamid Reza Hassanzadeh, Ying Sha, May D. Wang
Abstract	Multiple cause-of-death data provides a valuable source of information that can be used to enhance health standards by predicting health related trajectories in societies with large populations. These data are often available in large quantities across U.S. states and require Big Data techniques to uncover complex hidden patterns. We design two different classes of models suitable for large-scale analysis of mortality data, a Hadoop-based ensemble of random forests trained over N-grams, and the DeepDeath, a deep classifier based on the recurrent neural network (RNN). We apply both classes to the mortality data provided by the National Center for Health Statistics and show that while both perform significantly better than the random classifier, the deep model that utilizes long short-term memory networks (LSTMs), surpasses the N-gram based models and is capable of learning the temporal aspect of the data without a need for building ad-hoc, expert-driven features.
Tasks
Published	2017-05-06
URL	http://arxiv.org/abs/1705.03508v1
PDF	http://arxiv.org/pdf/1705.03508v1.pdf
PWC	https://paperswithcode.com/paper/deepdeath-learning-to-predict-the-underlying
Repo
Framework

Outlier-robust moment-estimation via sum-of-squares


Title	Outlier-robust moment-estimation via sum-of-squares
Authors	Pravesh K. Kothari, David Steurer
Abstract	We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent works of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds for the class of distributions we consider. These improved guarantees allow us to give improved algorithms for independent component analysis and learning mixtures of Gaussians in the presence of outliers. Our algorithms are based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem: Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that is closest in statistical distance to the empirical distribution of the adversarially-corrupted sample.
Tasks
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11581v2
PDF	http://arxiv.org/pdf/1711.11581v2.pdf
PWC	https://paperswithcode.com/paper/outlier-robust-moment-estimation-via-sum-of
Repo
Framework