May 5, 2019

3064 words 15 mins read

Paper Group ANR 568

Efficient phase retrieval based on dark fringe recognition with an ability of bypassing invalid fringes. Deep Active Learning for Dialogue Generation. EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras. Learning Robust Features for Gait Recognition by Maximum Margin Criterion. A Survey of Visual Analysis of Human Motion and Its …

Efficient phase retrieval based on dark fringe recognition with an ability of bypassing invalid fringes


Title	Efficient phase retrieval based on dark fringe recognition with an ability of bypassing invalid fringes
Authors	Wen-Kai Yu, An-Dong Xiong, Xu-Ri Yao, Guang-Jie Zhai, Qing Zhao
Abstract	This paper discusses the noisy phase retrieval problem: recovering a complex image signal with independent noise from quadratic measurements. Inspired by the dark fringes shown in the measured images of the array detector, a novel phase retrieval approach is proposed and demonstrated both theoretically and experimentally to recognize the dark fringes and bypass the invalid fringes. A more accurate relative phase ratio between arbitrary two pixels is achieved by calculating the multiplicative ratios (or the sum of phase difference) on the path between them. Then the object phase image can be reconstructed precisely. Our approach is a good choice for retrieving high-quality phase images from noisy signals and has many potential applications in the fields such as X-ray crystallography, diffractive imaging, and so on.
Tasks
Published	2016-12-14
URL	http://arxiv.org/abs/1612.04733v1
PDF	http://arxiv.org/pdf/1612.04733v1.pdf
PWC	https://paperswithcode.com/paper/efficient-phase-retrieval-based-on-dark
Repo
Framework

Deep Active Learning for Dialogue Generation


Title	Deep Active Learning for Dialogue Generation
Authors	Nabiha Asghar, Pascal Poupart, Xin Jiang, Hang Li
Abstract	We propose an online, end-to-end, neural generative conversational model for open-domain dialogue. It is trained using a unique combination of offline two-phase supervised learning and online human-in-the-loop active learning. While most existing research proposes offline supervision or hand-crafted reward functions for online reinforcement, we devise a novel interactive learning mechanism based on hamming-diverse beam search for response generation and one-character user-feedback at each step. Experiments show that our model inherently promotes the generation of semantically relevant and interesting responses, and can be used to train agents with customized personas, moods and conversational styles.
Tasks	Active Learning, Dialogue Generation
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03929v5
PDF	http://arxiv.org/pdf/1612.03929v5.pdf
PWC	https://paperswithcode.com/paper/deep-active-learning-for-dialogue-generation
Repo
Framework

EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras


Title	EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras
Authors	Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, Christian Theobalt
Abstract	Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion. This makes capturing independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes with many people in close vicinity. The captured user can freely move around, which enables reconstruction of larger-scale activities and is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.
Tasks	Motion Capture, Pose Estimation
Published	2016-09-23
URL	http://arxiv.org/abs/1609.07306v1
PDF	http://arxiv.org/pdf/1609.07306v1.pdf
PWC	https://paperswithcode.com/paper/egocap-egocentric-marker-less-motion-capture-1
Repo
Framework

Learning Robust Features for Gait Recognition by Maximum Margin Criterion


Title	Learning Robust Features for Gait Recognition by Maximum Margin Criterion
Authors	Michal Balazia, Petr Sojka
Abstract	In the field of gait recognition from motion capture data, designing human-interpretable gait features is a common practice of many fellow researchers. To refrain from ad-hoc schemes and to find maximally discriminative features we may need to explore beyond the limits of human interpretability. This paper contributes to the state-of-the-art with a machine learning approach for extracting robust gait features directly from raw joint coordinates. The features are learned by a modification of Linear Discriminant Analysis with Maximum Margin Criterion so that the identities are maximally separated and, in combination with an appropriate classifier, used for gait recognition. Experiments on the CMU MoCap database show that this method outperforms eight other relevant methods in terms of the distribution of biometric templates in respective feature spaces expressed in four class separability coefficients. Additional experiments indicate that this method is a leading concept for rank-based classifier systems.
Tasks	Gait Recognition, Motion Capture
Published	2016-09-14
URL	http://arxiv.org/abs/1609.04392v5
PDF	http://arxiv.org/pdf/1609.04392v5.pdf
PWC	https://paperswithcode.com/paper/learning-robust-features-for-gait-recognition
Repo
Framework

A Survey of Visual Analysis of Human Motion and Its Applications


Title	A Survey of Visual Analysis of Human Motion and Its Applications
Authors	Qifei Wang
Abstract	This paper summarizes the recent progress in human motion analysis and its applications. In the beginning, we reviewed the motion capture systems and the representation model of human’s motion data. Next, we sketched the advanced human motion data processing technologies, including motion data filtering, temporal alignment, and segmentation. The following parts overview the state-of-the-art approaches of action recognition and dynamics measuring since these two are the most active research areas in human motion analysis. The last part discusses some emerging applications of the human motion analysis in healthcare, human robot interaction, security surveillance, virtual reality and animation. The promising research topics of human motion analysis in the future is also summarized in the last part.
Tasks	Motion Capture, Temporal Action Localization
Published	2016-08-02
URL	http://arxiv.org/abs/1608.00700v2
PDF	http://arxiv.org/pdf/1608.00700v2.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-visual-analysis-of-human-motion
Repo
Framework

Coherent structure coloring: identification of coherent structures from sparse data using graph theory


Title	Coherent structure coloring: identification of coherent structures from sparse data using graph theory
Authors	Kristy L. Schlueter-Kuck, John O. Dabiri
Abstract	We present a frame-invariant method for detecting coherent structures from Lagrangian flow trajectories that can be sparse in number, as is the case in many fluid mechanics applications of practical interest. The method, based on principles used in graph coloring and spectral graph drawing algorithms, examines a measure of the kinematic dissimilarity of all pairs of fluid trajectories, either measured experimentally, e.g. using particle tracking velocimetry; or numerically, by advecting fluid particles in the Eulerian velocity field. Coherence is assigned to groups of particles whose kinematics remain similar throughout the time interval for which trajectory data is available, regardless of their physical proximity to one another. Through the use of several analytical and experimental validation cases, this algorithm is shown to robustly detect coherent structures using significantly less flow data than is required by existing spectral graph theory methods.
Tasks
Published	2016-10-01
URL	http://arxiv.org/abs/1610.00197v2
PDF	http://arxiv.org/pdf/1610.00197v2.pdf
PWC	https://paperswithcode.com/paper/coherent-structure-coloring-identification-of
Repo
Framework

A probabilistic patch based image representation using Conditional Random Field model for image classification


Title	A probabilistic patch based image representation using Conditional Random Field model for image classification
Authors	Fariborz Taherkhani
Abstract	In this paper we proposed an ordered patch based method using Conditional Random Field (CRF) in order to encode local properties and their spatial relationship in images to address texture classification, face recognition, and scene classification problems. Typical image classification approaches work without considering spatial causality among distinctive properties of an image for image representation in feature space. In this method first, each image is encoded as a sequence of ordered patches, including local properties. Second, the sequence of these ordered patches is modeled as a probabilistic feature vector by CRF to model spatial relationship of these local properties. And finally, image classification is performed on such probabilistic image representation. Experimental results on several standard image datasets indicate that proposed method outperforms some of existing image classification methods.
Tasks	Face Recognition, Image Classification, Scene Classification, Texture Classification
Published	2016-07-22
URL	http://arxiv.org/abs/1607.06797v2
PDF	http://arxiv.org/pdf/1607.06797v2.pdf
PWC	https://paperswithcode.com/paper/a-probabilistic-patch-based-image
Repo
Framework

The KIT Motion-Language Dataset


Title	The KIT Motion-Language Dataset
Authors	Matthias Plappert, Christian Mandery, Tamim Asfour
Abstract	Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, while there have been years of research in this area, no standardized and openly available dataset exists to support the development and evaluation of such systems. We therefore propose the KIT Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our dataset using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our dataset or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting dataset, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our dataset an excellent choice that enables more transparent and comparable research in this important area.
Tasks	Motion Capture
Published	2016-07-13
URL	http://arxiv.org/abs/1607.03827v2
PDF	http://arxiv.org/pdf/1607.03827v2.pdf
PWC	https://paperswithcode.com/paper/the-kit-motion-language-dataset
Repo
Framework

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild


Title	MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
Authors	Grégory Rogez, Cordelia Schmid
Abstract	This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Data Augmentation, Motion Capture, Pose Estimation
Published	2016-07-07
URL	http://arxiv.org/abs/1607.02046v2
PDF	http://arxiv.org/pdf/1607.02046v2.pdf
PWC	https://paperswithcode.com/paper/mocap-guided-data-augmentation-for-3d-pose
Repo
Framework

First Result on Arabic Neural Machine Translation


Title	First Result on Arabic Neural Machine Translation
Authors	Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Aaron Courville
Abstract	Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation. We notice however that much of research on neural machine translation has focused on European languages despite its language agnostic nature. In this paper, we apply neural machine translation to the task of Arabic translation (Ar<->En) and compare it against a standard phrase-based translation system. We run extensive comparison using various configurations in preprocessing Arabic script and show that the phrase-based and neural translation systems perform comparably to each other and that proper preprocessing of Arabic script has a similar effect on both of the systems. We however observe that the neural machine translation significantly outperform the phrase-based system on an out-of-domain test set, making it attractive for real-world deployment.
Tasks	Machine Translation
Published	2016-06-08
URL	http://arxiv.org/abs/1606.02680v1
PDF	http://arxiv.org/pdf/1606.02680v1.pdf
PWC	https://paperswithcode.com/paper/first-result-on-arabic-neural-machine
Repo
Framework

Fractal Dimension Invariant Filtering and Its CNN-based Implementation


Title	Fractal Dimension Invariant Filtering and Its CNN-based Implementation
Authors	Hongteng Xu, Junchi Yan, Nils Persson, Weiyao Lin, Hongyuan Zha
Abstract	Fractal analysis has been widely used in computer vision, especially in texture image processing and texture analysis. The key concept of fractal-based image model is the fractal dimension, which is invariant to bi-Lipschitz transformation of image, and thus capable of representing intrinsic structural information of image robustly. However, the invariance of fractal dimension generally does not hold after filtering, which limits the application of fractal-based image model. In this paper, we propose a novel fractal dimension invariant filtering (FDIF) method, extending the invariance of fractal dimension to filtering operations. Utilizing the notion of local self-similarity, we first develop a local fractal model for images. By adding a nonlinear post-processing step behind anisotropic filter banks, we demonstrate that the proposed filtering method is capable of preserving the local invariance of the fractal dimension of image. Meanwhile, we show that the FDIF method can be re-instantiated approximately via a CNN-based architecture, where the convolution layer extracts anisotropic structure of image and the nonlinear layer enhances the structure via preserving local fractal dimension of image. The proposed filtering method provides us with a novel geometric interpretation of CNN-based image model. Focusing on a challenging image processing task — detecting complicated curves from the texture-like images, the proposed method obtains superior results to the state-of-art approaches.
Tasks	Texture Classification
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06036v3
PDF	http://arxiv.org/pdf/1603.06036v3.pdf
PWC	https://paperswithcode.com/paper/fractal-dimension-invariant-filtering-and-its
Repo
Framework

Evaluating semantic models with word-sentence relatedness


Title	Evaluating semantic models with word-sentence relatedness
Authors	Kimberly Glasgow, Matthew Roos, Amy Haufler, Mark Chevillet, Michael Wolmetz
Abstract	Semantic textual similarity (STS) systems are designed to encode and evaluate the semantic similarity between words, phrases, sentences, and documents. One method for assessing the quality or authenticity of semantic information encoded in these systems is by comparison with human judgments. A data set for evaluating semantic models was developed consisting of 775 English word-sentence pairs, each annotated for semantic relatedness by human raters engaged in a Maximum Difference Scaling (MDS) task, as well as a faster alternative task. As a sample application of this relatedness data, behavior-based relatedness was compared to the relatedness computed via four off-the-shelf STS models: n-gram, Latent Semantic Analysis (LSA), Word2Vec, and UMBC Ebiquity. Some STS models captured much of the variance in the human judgments collected, but they were not sensitive to the implicatures and entailments that were processed and considered by the participants. All text stimuli and judgment data have been made freely available.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2016-03-23
URL	http://arxiv.org/abs/1603.07253v2
PDF	http://arxiv.org/pdf/1603.07253v2.pdf
PWC	https://paperswithcode.com/paper/evaluating-semantic-models-with-word-sentence
Repo
Framework


Title	Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Authors	Ciprian Corneanu, Marc Oliu, Jeffrey F. Cohn, Sergio Escalera
Abstract	Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research.
Tasks	Face Detection, Facial Expression Recognition
Published	2016-06-10
URL	http://arxiv.org/abs/1606.03237v1
PDF	http://arxiv.org/pdf/1606.03237v1.pdf
PWC	https://paperswithcode.com/paper/survey-on-rgb-3d-thermal-and-multimodal
Repo
Framework

Proving the Incompatibility of Efficiency and Strategyproofness via SMT Solving


Title	Proving the Incompatibility of Efficiency and Strategyproofness via SMT Solving
Authors	Florian Brandl, Felix Brandt, Manuel Eberl, Christian Geist
Abstract	Two important requirements when aggregating the preferences of multiple agents are that the outcome should be economically efficient and the aggregation mechanism should not be manipulable. In this paper, we provide a computer-aided proof of a sweeping impossibility using these two conditions for randomized aggregation mechanisms. More precisely, we show that every efficient aggregation mechanism can be manipulated for all expected utility representations of the agents’ preferences. This settles an open problem and strengthens a number of existing theorems, including statements that were shown within the special domain of assignment. Our proof is obtained by formulating the claim as a satisfiability problem over predicates from real-valued arithmetic, which is then checked using an SMT (satisfiability modulo theories) solver. In order to verify the correctness of the result, a minimal unsatisfiable set of constraints returned by the SMT solver was translated back into a proof in higher-order logic, which was automatically verified by an interactive theorem prover. To the best of our knowledge, this is the first application of SMT solvers in computational social choice.
Tasks
Published	2016-04-19
URL	http://arxiv.org/abs/1604.05692v4
PDF	http://arxiv.org/pdf/1604.05692v4.pdf
PWC	https://paperswithcode.com/paper/proving-the-incompatibility-of-efficiency-and
Repo
Framework

mdBrief - A Fast Online Adaptable, Distorted Binary Descriptor for Real-Time Applications Using Calibrated Wide-Angle Or Fisheye Cameras


Title	mdBrief - A Fast Online Adaptable, Distorted Binary Descriptor for Real-Time Applications Using Calibrated Wide-Angle Or Fisheye Cameras
Authors	Steffen Urban, Stefan Hinz
Abstract	Fast binary descriptors build the core for many vision based applications with real-time demands like object detection, Visual Odometry or SLAM. Commonly it is assumed, that the acquired images and thus the patches extracted around keypoints originate from a perspective projection ignoring image distortion or completely different types of projections such as omnidirectional or fisheye. Usually the deviations from a perfect perspective projection are corrected by undistortion. Latter, however, introduces severe artifacts if the cameras field-of-view gets larger. In this paper, we propose a distorted and masked version of the BRIEF descriptor for calibrated cameras. Instead of correcting the distortion holistically, we distort the binary tests and thus adapt the descriptor to different image regions.
Tasks	Object Detection, Visual Odometry
Published	2016-10-25
URL	http://arxiv.org/abs/1610.07804v1
PDF	http://arxiv.org/pdf/1610.07804v1.pdf
PWC	https://paperswithcode.com/paper/mdbrief-a-fast-online-adaptable-distorted
Repo
Framework