October 16, 2019

3173 words 15 mins read

Paper Group ANR 988

RGB-T Object Tracking:Benchmark and Baseline. Multi-Domain Pose Network for Multi-Person Pose Estimation and Tracking. An Analysis of the Semantic Annotation Task on the Linked Data Cloud. Praaline: Integrating Tools for Speech Corpus Research. Contrasting information theoretic decompositions of modulatory and arithmetic interactions in neural info …

RGB-T Object Tracking:Benchmark and Baseline


Title	RGB-T Object Tracking:Benchmark and Baseline
Authors	Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, Jin Tang
Abstract	RGB-Thermal (RGB-T) object tracking receives more and more attention due to the strongly complementary benefits of thermal information to visible data. However, RGB-T research is limited by lacking a comprehensive evaluation platform. In this paper, we propose a large-scale video benchmark dataset for RGB-T tracking.It has three major advantages over existing ones: 1) Its size is sufficiently large for large-scale performance evaluation (total frame number: 234K, maximum frame per sequence: 8K). 2) The alignment between RGB-T sequence pairs is highly accurate, which does not need pre- or post-processing. 3) The occlusion levels are annotated for occlusion-sensitive performance analysis of different tracking algorithms.Moreover, we propose a novel graph-based approach to learn a robust object representation for RGB-T tracking. In particular, the tracked object is represented with a graph with image patches as nodes. This graph including graph structure, node weights and edge weights is dynamically learned in a unified ADMM (alternating direction method of multipliers)-based optimization framework, in which the modality weights are also incorporated for adaptive fusion of multiple source data.Extensive experiments on the large-scale dataset are executed to demonstrate the effectiveness of the proposed tracker against other state-of-the-art tracking methods. We also provide new insights and potential research directions to the field of RGB-T object tracking.
Tasks	Object Tracking, Rgb-T Tracking
Published	2018-05-23
URL	http://arxiv.org/abs/1805.08982v1
PDF	http://arxiv.org/pdf/1805.08982v1.pdf
PWC	https://paperswithcode.com/paper/rgb-t-object-trackingbenchmark-and-baseline
Repo
Framework

Multi-Domain Pose Network for Multi-Person Pose Estimation and Tracking


Title	Multi-Domain Pose Network for Multi-Person Pose Estimation and Tracking
Authors	Hengkai Guo, Tang Tang, Guozhong Luo, Riwei Chen, Yongchen Lu, Linfu Wen
Abstract	Multi-person human pose estimation and tracking in the wild is important and challenging. For training a powerful model, large-scale training data are crucial. While there are several datasets for human pose estimation, the best practice for training on multi-dataset has not been investigated. In this paper, we present a simple network called Multi-Domain Pose Network (MDPN) to address this problem. By treating the task as multi-domain learning, our methods can learn a better representation for pose prediction. Together with prediction heads fine-tuning and multi-branch combination, it shows significant improvement over baselines and achieves the best performance on PoseTrack ECCV 2018 Challenge without additional datasets other than MPII and COCO.
Tasks	Multi-Person Pose Estimation, Multi-Person Pose Estimation and Tracking, Pose Estimation, Pose Prediction
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08338v1
PDF	http://arxiv.org/pdf/1810.08338v1.pdf
PWC	https://paperswithcode.com/paper/multi-domain-pose-network-for-multi-person
Repo
Framework

An Analysis of the Semantic Annotation Task on the Linked Data Cloud


Title	An Analysis of the Semantic Annotation Task on the Linked Data Cloud
Authors	Gagnon Michel, Zouaq Amal, Aranha Francisco, Ensan Faezeh, Jean-Louis Ludovic
Abstract	Semantic annotation, the process of identifying key-phrases in texts and linking them to concepts in a knowledge base, is an important basis for semantic information retrieval and the Semantic Web uptake. Despite the emergence of semantic annotation systems, very few comparative studies have been published on their performance. In this paper, we provide an evaluation of the performance of existing systems over three tasks: full semantic annotation, named entity recognition, and keyword detection. More specifically, the spotting capability (recognition of relevant surface forms in text) is evaluated for all three tasks, whereas the disambiguation (correctly associating an entity from Wikipedia or DBpedia to the spotted surface forms) is evaluated only for the first two tasks. Our evaluation is twofold: First, we compute standard precision and recall on the output of semantic annotators on diverse datasets, each best suited for one of the identified tasks. Second, we build a statistical model using logistic regression to identify significant performance differences. Our results show that systems that provide full annotation perform better than named entities annotators and keyword extractors, for all three tasks. However, there is still much room for improvement for the identification of the most relevant entities described in a text.
Tasks	Information Retrieval, Named Entity Recognition
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05549v1
PDF	http://arxiv.org/pdf/1811.05549v1.pdf
PWC	https://paperswithcode.com/paper/an-analysis-of-the-semantic-annotation-task
Repo
Framework

Praaline: Integrating Tools for Speech Corpus Research


Title	Praaline: Integrating Tools for Speech Corpus Research
Authors	George Christodoulides
Abstract	This paper presents Praaline, an open-source software system for managing, annotating, analysing and visualising speech corpora. Researchers working with speech corpora are often faced with multiple tools and formats, and they need to work with ever-increasing amounts of data in a collaborative way. Praaline integrates and extends existing time-proven tools for spoken corpora analysis (Praat, Sonic Visualiser and a bridge to the R statistical package) in a modular system, facilitating automation and reuse. Users are exposed to an integrated, user-friendly interface from which to access multiple tools. Corpus metadata and annotations may be stored in a database, locally or remotely, and users can define the metadata and annotation structure. Users may run a customisable cascade of analysis steps, based on plug-ins and scripts, and update the database with the results. The corpus database may be queried, to produce aggregated data-sets. Praaline is extensible using Python or C++ plug-ins, while Praat and R scripts may be executed against the corpus data. A series of visualisations, editors and plug-ins are provided. Praaline is free software, released under the GPL license.
Tasks
Published	2018-02-08
URL	http://arxiv.org/abs/1802.02914v1
PDF	http://arxiv.org/pdf/1802.02914v1.pdf
PWC	https://paperswithcode.com/paper/praaline-integrating-tools-for-speech-corpus
Repo
Framework

Contrasting information theoretic decompositions of modulatory and arithmetic interactions in neural information processing systems


Title	Contrasting information theoretic decompositions of modulatory and arithmetic interactions in neural information processing systems
Authors	Jim W. Kay, William A. Phillips
Abstract	Biological and artificial neural systems are composed of many local processors, and their capabilities depend upon the transfer function that relates each local processor’s outputs to its inputs. This paper uses a recent advance in the foundations of information theory to study the properties of local processors that use contextual input to amplify or attenuate transmission of information about their driving inputs. This advance enables the information transmitted by processors with two distinct inputs to be decomposed into those components unique to each input, that shared between the two inputs, and that which depends on both though it is in neither, i.e. synergy. The decompositions that we report here show that contextual modulation has information processing properties that contrast with those of all four simple arithmetic operators, that it can take various forms, and that the form used in our previous studies of artificial neural nets composed of local processors with both driving and contextual inputs is particularly well-suited to provide the distinctive capabilities of contextual modulation under a wide range of conditions. We argue that the decompositions reported here could be compared with those obtained from empirical neurobiological and psychophysical data under conditions thought to reflect contextual modulation. That would then shed new light on the underlying processes involved. Finally, we suggest that such decompositions could aid the design of context-sensitive machine learning algorithms.
Tasks
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05897v1
PDF	http://arxiv.org/pdf/1803.05897v1.pdf
PWC	https://paperswithcode.com/paper/contrasting-information-theoretic
Repo
Framework

Election with Bribed Voter Uncertainty: Hardness and Approximation Algorithm


Title	Election with Bribed Voter Uncertainty: Hardness and Approximation Algorithm
Authors	Lin Chen, Lei Xu, Shouhuai Xu, Zhimin Gao, Weidong Shi
Abstract	Bribery in election (or computational social choice in general) is an important problem that has received a considerable amount of attention. In the classic bribery problem, the briber (or attacker) bribes some voters in attempting to make the briber’s designated candidate win an election. In this paper, we introduce a novel variant of the bribery problem, “Election with Bribed Voter Uncertainty” or BVU for short, accommodating the uncertainty that the vote of a bribed voter may or may not be counted. This uncertainty occurs either because a bribed voter may not cast its vote in fear of being caught, or because a bribed voter is indeed caught and therefore its vote is discarded. As a first step towards ultimately understanding and addressing this important problem, we show that it does not admit any multiplicative $O(1)$-approximation algorithm modulo standard complexity assumptions. We further show that there is an approximation algorithm that returns a solution with an additive-$\epsilon$ error in FPT time for any fixed $\epsilon$.
Tasks
Published	2018-11-07
URL	http://arxiv.org/abs/1811.03158v1
PDF	http://arxiv.org/pdf/1811.03158v1.pdf
PWC	https://paperswithcode.com/paper/election-with-bribed-voter-uncertainty
Repo
Framework

Stable Recurrent Models


Title	Stable Recurrent Models
Authors	John Miller, Moritz Hardt
Abstract	Stability is a fundamental property of dynamical systems, yet to this date it has had little bearing on the practice of recurrent neural networks. In this work, we conduct a thorough investigation of stable recurrent models. Theoretically, we prove stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Empirically, we demonstrate stable recurrent models often perform as well as their unstable counterparts on benchmark sequence tasks. Taken together, these findings shed light on the effective power of recurrent networks and suggest much of sequence learning happens, or can be made to happen, in the stable regime. Moreover, our results help to explain why in many cases practitioners succeed in replacing recurrent models by feed-forward models.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10369v4
PDF	http://arxiv.org/pdf/1805.10369v4.pdf
PWC	https://paperswithcode.com/paper/stable-recurrent-models
Repo
Framework

Towards Deep Learning based Hand Keypoints Detection for Rapid Sequential Movements from RGB Images


Title	Towards Deep Learning based Hand Keypoints Detection for Rapid Sequential Movements from RGB Images
Authors	Srujana Gattupalli, Ashwin Ramesh Babu, James Robert Brady, Fillia Makedon, Vassilis Athitsos
Abstract	Hand keypoints detection and pose estimation has numerous applications in computer vision, but it is still an unsolved problem in many aspects. An application of hand keypoints detection is in performing cognitive assessments of a subject by observing the performance of that subject in physical tasks involving rapid finger motion. As a part of this work, we introduce a novel hand key-points benchmark dataset that consists of hand gestures recorded specifically for cognitive behavior monitoring. We explore the state of the art methods in hand keypoint detection and we provide quantitative evaluations for the performance of these methods on our dataset. In future, these results and our dataset can serve as a useful benchmark for hand keypoint recognition for rapid finger movements.
Tasks	Keypoint Detection, Pose Estimation
Published	2018-04-03
URL	http://arxiv.org/abs/1804.01174v1
PDF	http://arxiv.org/pdf/1804.01174v1.pdf
PWC	https://paperswithcode.com/paper/towards-deep-learning-based-hand-keypoints
Repo
Framework

A Parallel Double Greedy Algorithm for Submodular Maximization


Title	A Parallel Double Greedy Algorithm for Submodular Maximization
Authors	Alina Ene, Huy L. Nguyen, Adrian Vladu
Abstract	We study parallel algorithms for the problem of maximizing a non-negative submodular function. Our main result is an algorithm that achieves a nearly-optimal $1/2 -\epsilon$ approximation using $O(\log(1/\epsilon) / \epsilon)$ parallel rounds of function evaluations. Our algorithm is based on a continuous variant of the double greedy algorithm of Buchbinder et al. that achieves the optimal $1/2$ approximation in the sequential setting. Our algorithm applies more generally to the problem of maximizing a continuous diminishing-returns (DR) function.
Tasks
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01591v1
PDF	http://arxiv.org/pdf/1812.01591v1.pdf
PWC	https://paperswithcode.com/paper/a-parallel-double-greedy-algorithm-for
Repo
Framework

Classification of X-Ray Protein Crystallization Using Deep Convolutional Neural Networks with a Finder Module


Title	Classification of X-Ray Protein Crystallization Using Deep Convolutional Neural Networks with a Finder Module
Authors	Yusei Miura, Tetsuya Sakurai, Claus Aranha, Toshiya Senda, Ryuichi Kato, Yusuke Yamada
Abstract	Recently, deep convolutional neural networks have shown good results for image recognition. In this paper, we use convolutional neural networks with a finder module, which discovers the important region for recognition and extracts that region. We propose applying our method to the recognition of protein crystals for X-ray structural analysis. In this analysis, it is necessary to recognize states of protein crystallization from a large number of images. There are several methods that realize protein crystallization recognition by using convolutional neural networks. In each method, large-scale data sets are required to recognize with high accuracy. In our data set, the number of images is not good enough for training CNN. The amount of data for CNN is a serious issue in various fields. Our method realizes high accuracy recognition with few images by discovering the region where the crystallization drop exists. We compared our crystallization image recognition method with a high precision method using Inception-V3. We demonstrate that our method is effective for crystallization images using several experiments. Our method gained the AUC value that is about 5% higher than the compared method.
Tasks
Published	2018-12-25
URL	http://arxiv.org/abs/1812.10087v1
PDF	http://arxiv.org/pdf/1812.10087v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-x-ray-protein
Repo
Framework

Automatic Detection of Reflective Thinking in Mathematical Problem Solving based on Unconstrained Bodily Exploration


Title	Automatic Detection of Reflective Thinking in Mathematical Problem Solving based on Unconstrained Bodily Exploration
Authors	Temitayo A. Olugbade, Joseph Newbold, Rose Johnson, Erica Volta, Paolo Alborno, Radoslaw Niewiadomski, Max Dillon, Gualtiero Volpe, Nadia Bianchi-Berthouze
Abstract	For technology (like serious games) that aims to deliver interactive learning, it is important to address relevant mental experiences such as reflective thinking during problem solving. To facilitate research in this direction, we present the weDraw-1 Movement Dataset of body movement sensor data and reflective thinking labels for 26 children solving mathematical problems in unconstrained settings where the body (full or parts) was required to explore these problems. Further, we provide qualitative analysis of behaviours that observers used in identifying reflective thinking moments in these sessions. The body movement cues from our compilation informed features that lead to average F1 score of 0.73 for automatic detection of reflective thinking based on Long Short-Term Memory neural networks. We further obtained 0.79 average F1 score for end-to-end detection of reflective thinking periods, i.e. based on raw sensor data. Finally, the algorithms resulted in 0.64 average F1 score for period subsegments as short as 4 seconds. Overall, our results show the possibility of detecting reflective thinking moments from body movement behaviours of a child exploring mathematical concepts bodily, such as within serious game play.
Tasks
Published	2018-12-18
URL	https://arxiv.org/abs/1812.07941v2
PDF	https://arxiv.org/pdf/1812.07941v2.pdf
PWC	https://paperswithcode.com/paper/automatic-detection-of-reflective-thinking-in
Repo
Framework

Learning-based Image Reconstruction via Parallel Proximal Algorithm


Title	Learning-based Image Reconstruction via Parallel Proximal Algorithm
Authors	Emrah Bostan, Ulugbek S. Kamilov, Laura Waller
Abstract	In the past decade, sparsity-driven regularization has led to advancement of image reconstruction algorithms. Traditionally, such regularizers rely on analytical models of sparsity (e.g. total variation (TV)). However, more recent methods are increasingly centered around data-driven arguments inspired by deep learning. In this letter, we propose to generalize TV regularization by replacing the l1-penalty with an alternative prior that is trainable. Specifically, our method learns the prior via extending the recently proposed fast parallel proximal algorithm (FPPA) to incorporate data-adaptive proximal operators. The proposed framework does not require additional inner iterations for evaluating the proximal mappings of the corresponding learned prior. Moreover, our formalism ensures that the training and reconstruction processes share the same algorithmic structure, making the end-to-end implementation intuitive. As an example, we demonstrate our algorithm on the problem of deconvolution in a fluorescence microscope.
Tasks	Image Reconstruction
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09518v1
PDF	http://arxiv.org/pdf/1801.09518v1.pdf
PWC	https://paperswithcode.com/paper/learning-based-image-reconstruction-via
Repo
Framework

Generative Adversarial Networks (GANs): What it can generate and What it cannot?


Title	Generative Adversarial Networks (GANs): What it can generate and What it cannot?
Authors	P Manisha, Sujit Gujar
Abstract	In recent years, Generative Adversarial Networks (GANs) have received significant attention from the research community. With a straightforward implementation and outstanding results, GANs have been used for numerous applications. Despite the success, GANs lack a proper theoretical explanation. These models suffer from issues like mode collapse, non-convergence, and instability during training. To address these issues, researchers have proposed theoretically rigorous frameworks inspired by varied fields of Game theory, Statistical theory, Dynamical systems, etc. In this paper, we propose to give an appropriate structure to study these contributions systematically. We essentially categorize the papers based on the issues they raise and the kind of novelty they introduce to address them. Besides, we provide insight into how each of the discussed articles solves the concerned problems. We compare and contrast different results and put forth a summary of theoretical contributions about GANs with focus on image/visual applications. We expect this summary paper to give a bird’s eye view to a person wishing to understand the theoretical progress in GANs so far.
Tasks
Published	2018-03-31
URL	https://arxiv.org/abs/1804.00140v2
PDF	https://arxiv.org/pdf/1804.00140v2.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-networks-gans-what-it
Repo
Framework

Deep Reader: Information extraction from Document images via relation extraction and Natural Language


Title	Deep Reader: Information extraction from Document images via relation extraction and Natural Language
Authors	Vishwanath D, Rohit Rahul, Gunjan Sehgal, Swati, Arindam Chowdhury, Monika Sharma, Lovekesh Vig, Gautam Shroff, Ashwin Srinivasan
Abstract	Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information from tables, charts, footnotes, boxes, headings and retrieving the corresponding structured representation for the document remains a challenge and finds application in a large number of real-world use cases. In this paper, we propose a novel enterprise based end-to-end framework called DeepReader which facilitates information extraction from document images via identification of visual entities and populating a meta relational model across different entities in the document image. The model schema allows for an easy to understand abstraction of the entities detected by the deep vision models and the relationships between them. DeepReader has a suite of state-of-the-art vision algorithms which are applied to recognize handwritten and printed text, eliminate noisy effects, identify the type of documents and detect visual entities like tables, lines and boxes. Deep Reader maps the extracted entities into a rich relational schema so as to capture all the relevant relationships between entities (words, textboxes, lines etc) detected in the document. Relevant information and fields can then be extracted from the document by writing SQL queries on top of the relationship tables. A natural language based interface is added on top of the relationship schema so that a non-technical user, specifying the queries in natural language, can fetch the information with minimal effort. In this paper, we also demonstrate many different capabilities of Deep Reader and report results on a real-world use case.
Tasks	Optical Character Recognition, Relation Extraction
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04377v2
PDF	http://arxiv.org/pdf/1812.04377v2.pdf
PWC	https://paperswithcode.com/paper/deep-reader-information-extraction-from
Repo
Framework

A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations


Title	A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
Authors	Adil Salim, Pascal Bianchi, Walid Hachem
Abstract	The Douglas Rachford algorithm is an algorithm that converges to a minimizer of a sum of two convex functions. The algorithm consists in fixed point iterations involving computations of the proximity operators of the two functions separately. The paper investigates a stochastic version of the algorithm where both functions are random and the step size is constant. We establish that the iterates of the algorithm stay close to the set of solution with high probability when the step size is small enough. Application to structured regularization is considered.
Tasks
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00934v1
PDF	http://arxiv.org/pdf/1804.00934v1.pdf
PWC	https://paperswithcode.com/paper/a-constant-step-stochastic-douglas-rachford
Repo
Framework