Paper Group ANR 568
Efficient phase retrieval based on dark fringe recognition with an ability of bypassing invalid fringes. Deep Active Learning for Dialogue Generation. EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras. Learning Robust Features for Gait Recognition by Maximum Margin Criterion. A Survey of Visual Analysis of Human Motion and Its …
Efficient phase retrieval based on dark fringe recognition with an ability of bypassing invalid fringes
Title | Efficient phase retrieval based on dark fringe recognition with an ability of bypassing invalid fringes |
Authors | Wen-Kai Yu, An-Dong Xiong, Xu-Ri Yao, Guang-Jie Zhai, Qing Zhao |
Abstract | This paper discusses the noisy phase retrieval problem: recovering a complex image signal with independent noise from quadratic measurements. Inspired by the dark fringes shown in the measured images of the array detector, a novel phase retrieval approach is proposed and demonstrated both theoretically and experimentally to recognize the dark fringes and bypass the invalid fringes. A more accurate relative phase ratio between arbitrary two pixels is achieved by calculating the multiplicative ratios (or the sum of phase difference) on the path between them. Then the object phase image can be reconstructed precisely. Our approach is a good choice for retrieving high-quality phase images from noisy signals and has many potential applications in the fields such as X-ray crystallography, diffractive imaging, and so on. |
Tasks | |
Published | 2016-12-14 |
URL | http://arxiv.org/abs/1612.04733v1 |
http://arxiv.org/pdf/1612.04733v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-phase-retrieval-based-on-dark |
Repo | |
Framework | |
Deep Active Learning for Dialogue Generation
Title | Deep Active Learning for Dialogue Generation |
Authors | Nabiha Asghar, Pascal Poupart, Xin Jiang, Hang Li |
Abstract | We propose an online, end-to-end, neural generative conversational model for open-domain dialogue. It is trained using a unique combination of offline two-phase supervised learning and online human-in-the-loop active learning. While most existing research proposes offline supervision or hand-crafted reward functions for online reinforcement, we devise a novel interactive learning mechanism based on hamming-diverse beam search for response generation and one-character user-feedback at each step. Experiments show that our model inherently promotes the generation of semantically relevant and interesting responses, and can be used to train agents with customized personas, moods and conversational styles. |
Tasks | Active Learning, Dialogue Generation |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03929v5 |
http://arxiv.org/pdf/1612.03929v5.pdf | |
PWC | https://paperswithcode.com/paper/deep-active-learning-for-dialogue-generation |
Repo | |
Framework | |
EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras
Title | EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras |
Authors | Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, Christian Theobalt |
Abstract | Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion. This makes capturing independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes with many people in close vicinity. The captured user can freely move around, which enables reconstruction of larger-scale activities and is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body. |
Tasks | Motion Capture, Pose Estimation |
Published | 2016-09-23 |
URL | http://arxiv.org/abs/1609.07306v1 |
http://arxiv.org/pdf/1609.07306v1.pdf | |
PWC | https://paperswithcode.com/paper/egocap-egocentric-marker-less-motion-capture-1 |
Repo | |
Framework | |
Learning Robust Features for Gait Recognition by Maximum Margin Criterion
Title | Learning Robust Features for Gait Recognition by Maximum Margin Criterion |
Authors | Michal Balazia, Petr Sojka |
Abstract | In the field of gait recognition from motion capture data, designing human-interpretable gait features is a common practice of many fellow researchers. To refrain from ad-hoc schemes and to find maximally discriminative features we may need to explore beyond the limits of human interpretability. This paper contributes to the state-of-the-art with a machine learning approach for extracting robust gait features directly from raw joint coordinates. The features are learned by a modification of Linear Discriminant Analysis with Maximum Margin Criterion so that the identities are maximally separated and, in combination with an appropriate classifier, used for gait recognition. Experiments on the CMU MoCap database show that this method outperforms eight other relevant methods in terms of the distribution of biometric templates in respective feature spaces expressed in four class separability coefficients. Additional experiments indicate that this method is a leading concept for rank-based classifier systems. |
Tasks | Gait Recognition, Motion Capture |
Published | 2016-09-14 |
URL | http://arxiv.org/abs/1609.04392v5 |
http://arxiv.org/pdf/1609.04392v5.pdf | |
PWC | https://paperswithcode.com/paper/learning-robust-features-for-gait-recognition |
Repo | |
Framework | |
A Survey of Visual Analysis of Human Motion and Its Applications
Title | A Survey of Visual Analysis of Human Motion and Its Applications |
Authors | Qifei Wang |
Abstract | This paper summarizes the recent progress in human motion analysis and its applications. In the beginning, we reviewed the motion capture systems and the representation model of human’s motion data. Next, we sketched the advanced human motion data processing technologies, including motion data filtering, temporal alignment, and segmentation. The following parts overview the state-of-the-art approaches of action recognition and dynamics measuring since these two are the most active research areas in human motion analysis. The last part discusses some emerging applications of the human motion analysis in healthcare, human robot interaction, security surveillance, virtual reality and animation. The promising research topics of human motion analysis in the future is also summarized in the last part. |
Tasks | Motion Capture, Temporal Action Localization |
Published | 2016-08-02 |
URL | http://arxiv.org/abs/1608.00700v2 |
http://arxiv.org/pdf/1608.00700v2.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-visual-analysis-of-human-motion |
Repo | |
Framework | |
Coherent structure coloring: identification of coherent structures from sparse data using graph theory
Title | Coherent structure coloring: identification of coherent structures from sparse data using graph theory |
Authors | Kristy L. Schlueter-Kuck, John O. Dabiri |
Abstract | We present a frame-invariant method for detecting coherent structures from Lagrangian flow trajectories that can be sparse in number, as is the case in many fluid mechanics applications of practical interest. The method, based on principles used in graph coloring and spectral graph drawing algorithms, examines a measure of the kinematic dissimilarity of all pairs of fluid trajectories, either measured experimentally, e.g. using particle tracking velocimetry; or numerically, by advecting fluid particles in the Eulerian velocity field. Coherence is assigned to groups of particles whose kinematics remain similar throughout the time interval for which trajectory data is available, regardless of their physical proximity to one another. Through the use of several analytical and experimental validation cases, this algorithm is shown to robustly detect coherent structures using significantly less flow data than is required by existing spectral graph theory methods. |
Tasks | |
Published | 2016-10-01 |
URL | http://arxiv.org/abs/1610.00197v2 |
http://arxiv.org/pdf/1610.00197v2.pdf | |
PWC | https://paperswithcode.com/paper/coherent-structure-coloring-identification-of |
Repo | |
Framework | |
A probabilistic patch based image representation using Conditional Random Field model for image classification
Title | A probabilistic patch based image representation using Conditional Random Field model for image classification |
Authors | Fariborz Taherkhani |
Abstract | In this paper we proposed an ordered patch based method using Conditional Random Field (CRF) in order to encode local properties and their spatial relationship in images to address texture classification, face recognition, and scene classification problems. Typical image classification approaches work without considering spatial causality among distinctive properties of an image for image representation in feature space. In this method first, each image is encoded as a sequence of ordered patches, including local properties. Second, the sequence of these ordered patches is modeled as a probabilistic feature vector by CRF to model spatial relationship of these local properties. And finally, image classification is performed on such probabilistic image representation. Experimental results on several standard image datasets indicate that proposed method outperforms some of existing image classification methods. |
Tasks | Face Recognition, Image Classification, Scene Classification, Texture Classification |
Published | 2016-07-22 |
URL | http://arxiv.org/abs/1607.06797v2 |
http://arxiv.org/pdf/1607.06797v2.pdf | |
PWC | https://paperswithcode.com/paper/a-probabilistic-patch-based-image |
Repo | |
Framework | |
The KIT Motion-Language Dataset
Title | The KIT Motion-Language Dataset |
Authors | Matthias Plappert, Christian Mandery, Tamim Asfour |
Abstract | Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, while there have been years of research in this area, no standardized and openly available dataset exists to support the development and evaluation of such systems. We therefore propose the KIT Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our dataset using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our dataset or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting dataset, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our dataset an excellent choice that enables more transparent and comparable research in this important area. |
Tasks | Motion Capture |
Published | 2016-07-13 |
URL | http://arxiv.org/abs/1607.03827v2 |
http://arxiv.org/pdf/1607.03827v2.pdf | |
PWC | https://paperswithcode.com/paper/the-kit-motion-language-dataset |
Repo | |
Framework | |
MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
Title | MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild |
Authors | Grégory Rogez, Cordelia Schmid |
Abstract | This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images. |
Tasks | 3D Human Pose Estimation, 3D Pose Estimation, Data Augmentation, Motion Capture, Pose Estimation |
Published | 2016-07-07 |
URL | http://arxiv.org/abs/1607.02046v2 |
http://arxiv.org/pdf/1607.02046v2.pdf | |
PWC | https://paperswithcode.com/paper/mocap-guided-data-augmentation-for-3d-pose |
Repo | |
Framework | |
First Result on Arabic Neural Machine Translation
Title | First Result on Arabic Neural Machine Translation |
Authors | Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Aaron Courville |
Abstract | Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation. We notice however that much of research on neural machine translation has focused on European languages despite its language agnostic nature. In this paper, we apply neural machine translation to the task of Arabic translation (Ar<->En) and compare it against a standard phrase-based translation system. We run extensive comparison using various configurations in preprocessing Arabic script and show that the phrase-based and neural translation systems perform comparably to each other and that proper preprocessing of Arabic script has a similar effect on both of the systems. We however observe that the neural machine translation significantly outperform the phrase-based system on an out-of-domain test set, making it attractive for real-world deployment. |
Tasks | Machine Translation |
Published | 2016-06-08 |
URL | http://arxiv.org/abs/1606.02680v1 |
http://arxiv.org/pdf/1606.02680v1.pdf | |
PWC | https://paperswithcode.com/paper/first-result-on-arabic-neural-machine |
Repo | |
Framework | |
Fractal Dimension Invariant Filtering and Its CNN-based Implementation
Title | Fractal Dimension Invariant Filtering and Its CNN-based Implementation |
Authors | Hongteng Xu, Junchi Yan, Nils Persson, Weiyao Lin, Hongyuan Zha |
Abstract | Fractal analysis has been widely used in computer vision, especially in texture image processing and texture analysis. The key concept of fractal-based image model is the fractal dimension, which is invariant to bi-Lipschitz transformation of image, and thus capable of representing intrinsic structural information of image robustly. However, the invariance of fractal dimension generally does not hold after filtering, which limits the application of fractal-based image model. In this paper, we propose a novel fractal dimension invariant filtering (FDIF) method, extending the invariance of fractal dimension to filtering operations. Utilizing the notion of local self-similarity, we first develop a local fractal model for images. By adding a nonlinear post-processing step behind anisotropic filter banks, we demonstrate that the proposed filtering method is capable of preserving the local invariance of the fractal dimension of image. Meanwhile, we show that the FDIF method can be re-instantiated approximately via a CNN-based architecture, where the convolution layer extracts anisotropic structure of image and the nonlinear layer enhances the structure via preserving local fractal dimension of image. The proposed filtering method provides us with a novel geometric interpretation of CNN-based image model. Focusing on a challenging image processing task — detecting complicated curves from the texture-like images, the proposed method obtains superior results to the state-of-art approaches. |
Tasks | Texture Classification |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06036v3 |
http://arxiv.org/pdf/1603.06036v3.pdf | |
PWC | https://paperswithcode.com/paper/fractal-dimension-invariant-filtering-and-its |
Repo | |
Framework | |
Evaluating semantic models with word-sentence relatedness
Title | Evaluating semantic models with word-sentence relatedness |
Authors | Kimberly Glasgow, Matthew Roos, Amy Haufler, Mark Chevillet, Michael Wolmetz |
Abstract | Semantic textual similarity (STS) systems are designed to encode and evaluate the semantic similarity between words, phrases, sentences, and documents. One method for assessing the quality or authenticity of semantic information encoded in these systems is by comparison with human judgments. A data set for evaluating semantic models was developed consisting of 775 English word-sentence pairs, each annotated for semantic relatedness by human raters engaged in a Maximum Difference Scaling (MDS) task, as well as a faster alternative task. As a sample application of this relatedness data, behavior-based relatedness was compared to the relatedness computed via four off-the-shelf STS models: n-gram, Latent Semantic Analysis (LSA), Word2Vec, and UMBC Ebiquity. Some STS models captured much of the variance in the human judgments collected, but they were not sensitive to the implicatures and entailments that were processed and considered by the participants. All text stimuli and judgment data have been made freely available. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2016-03-23 |
URL | http://arxiv.org/abs/1603.07253v2 |
http://arxiv.org/pdf/1603.07253v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-semantic-models-with-word-sentence |
Repo | |
Framework | |
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Title | Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications |
Authors | Ciprian Corneanu, Marc Oliu, Jeffrey F. Cohn, Sergio Escalera |
Abstract | Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research. |
Tasks | Face Detection, Facial Expression Recognition |
Published | 2016-06-10 |
URL | http://arxiv.org/abs/1606.03237v1 |
http://arxiv.org/pdf/1606.03237v1.pdf | |
PWC | https://paperswithcode.com/paper/survey-on-rgb-3d-thermal-and-multimodal |
Repo | |
Framework | |
Proving the Incompatibility of Efficiency and Strategyproofness via SMT Solving
Title | Proving the Incompatibility of Efficiency and Strategyproofness via SMT Solving |
Authors | Florian Brandl, Felix Brandt, Manuel Eberl, Christian Geist |
Abstract | Two important requirements when aggregating the preferences of multiple agents are that the outcome should be economically efficient and the aggregation mechanism should not be manipulable. In this paper, we provide a computer-aided proof of a sweeping impossibility using these two conditions for randomized aggregation mechanisms. More precisely, we show that every efficient aggregation mechanism can be manipulated for all expected utility representations of the agents’ preferences. This settles an open problem and strengthens a number of existing theorems, including statements that were shown within the special domain of assignment. Our proof is obtained by formulating the claim as a satisfiability problem over predicates from real-valued arithmetic, which is then checked using an SMT (satisfiability modulo theories) solver. In order to verify the correctness of the result, a minimal unsatisfiable set of constraints returned by the SMT solver was translated back into a proof in higher-order logic, which was automatically verified by an interactive theorem prover. To the best of our knowledge, this is the first application of SMT solvers in computational social choice. |
Tasks | |
Published | 2016-04-19 |
URL | http://arxiv.org/abs/1604.05692v4 |
http://arxiv.org/pdf/1604.05692v4.pdf | |
PWC | https://paperswithcode.com/paper/proving-the-incompatibility-of-efficiency-and |
Repo | |
Framework | |
mdBrief - A Fast Online Adaptable, Distorted Binary Descriptor for Real-Time Applications Using Calibrated Wide-Angle Or Fisheye Cameras
Title | mdBrief - A Fast Online Adaptable, Distorted Binary Descriptor for Real-Time Applications Using Calibrated Wide-Angle Or Fisheye Cameras |
Authors | Steffen Urban, Stefan Hinz |
Abstract | Fast binary descriptors build the core for many vision based applications with real-time demands like object detection, Visual Odometry or SLAM. Commonly it is assumed, that the acquired images and thus the patches extracted around keypoints originate from a perspective projection ignoring image distortion or completely different types of projections such as omnidirectional or fisheye. Usually the deviations from a perfect perspective projection are corrected by undistortion. Latter, however, introduces severe artifacts if the cameras field-of-view gets larger. In this paper, we propose a distorted and masked version of the BRIEF descriptor for calibrated cameras. Instead of correcting the distortion holistically, we distort the binary tests and thus adapt the descriptor to different image regions. |
Tasks | Object Detection, Visual Odometry |
Published | 2016-10-25 |
URL | http://arxiv.org/abs/1610.07804v1 |
http://arxiv.org/pdf/1610.07804v1.pdf | |
PWC | https://paperswithcode.com/paper/mdbrief-a-fast-online-adaptable-distorted |
Repo | |
Framework | |