January 31, 2020

3074 words 15 mins read

Paper Group ANR 148

SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation. Transport Model for Feature Extraction. Exploring Representativeness and Informativeness for Active Learning. Fast Glare Detection in Document Images. On The Radon–Nikodym Spectral Approach With Optimal Clustering. Exponential-Binary …

SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation


Title	SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation
Authors	René Schuster, Oliver Wasenmüller, Christian Unger, Georg Kuschk, Didier Stricker
Abstract	State-of-the-art scene flow algorithms pursue the conflicting targets of accuracy, run time, and robustness. With the successful concept of pixel-wise matching and sparse-to-dense interpolation, we push the limits of scene flow estimation. Avoiding strong assumptions on the domain or the problem yields a more robust algorithm. This algorithm is fast because we avoid explicit regularization during matching, which allows an efficient computation. Using image information from multiple time steps and explicit visibility prediction based on previous results, we achieve competitive performances on different data sets. Our contributions and results are evaluated in comparative experiments. Overall, we present an accurate scene flow algorithm that is faster and more generic than any individual benchmark leader.
Tasks	Scene Flow Estimation
Published	2019-02-26
URL	https://arxiv.org/abs/1902.10099v2
PDF	https://arxiv.org/pdf/1902.10099v2.pdf
PWC	https://paperswithcode.com/paper/sceneflowfields-multi-frame-matching
Repo
Framework

Transport Model for Feature Extraction


Title	Transport Model for Feature Extraction
Authors	Wojciech Czaja, Dong Dong, Pierre-Emmanuel Jabin, Franck Olivier Ndjakou Njeunje
Abstract	We present a new feature extraction method for complex and large datasets, based on the concept of transport operators on graphs. The proposed approach generalizes and extends the many existing data representation methodologies built upon diffusion processes, to a new domain where dynamical systems play a key role. The main advantage of this approach comes from the ability to exploit different relationships than those arising in the context of e.g., Graph Laplacians. Fundamental properties of the transport operators are proved. We demonstrate the flexibility of the method by introducing several diverse examples of transformations. We close the paper with a series of computational experiments and applications to the problem of classification of hyperspectral satellite imagery, to illustrate the practical implications of our algorithm and its ability to quantify new aspects of relationships within complicated datasets.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14543v1
PDF	https://arxiv.org/pdf/1910.14543v1.pdf
PWC	https://paperswithcode.com/paper/transport-model-for-feature-extraction
Repo
Framework

Exploring Representativeness and Informativeness for Active Learning


Title	Exploring Representativeness and Informativeness for Active Learning
Authors	Bo Du, Zengmao Wang, Lefei Zhang, Liangpei Zhang, Wei Liu, Jialie Shen, Dacheng Tao
Abstract	How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified Best-versus-Second-Best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.
Tasks	Active Learning
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06685v1
PDF	http://arxiv.org/pdf/1904.06685v1.pdf
PWC	https://paperswithcode.com/paper/exploring-representativeness-and
Repo
Framework

Fast Glare Detection in Document Images


Title	Fast Glare Detection in Document Images
Authors	Dmitry Rodin, Nikita Orlov
Abstract	Glare is a phenomenon that occurs when the scene has a reflection of a light source or has one in it. This luminescence can hide useful information from the image, making text recognition virtually impossible. In this paper, we propose an approach to detect glare in images taken by users via mobile devices. Our method divides the document into blocks and collects luminance features from the original image and black-white strokes histograms of the binarized image. Finally, glare is detected using a convolutional neural network on the aforementioned histograms and luminance features. The network consists of several feature extraction blocks, one for each type of input, and the detection block, which calculates the resulting glare heatmap based on the output of the extraction part. The proposed solution detects glare with high recall and f-score.
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1911.05189v1
PDF	https://arxiv.org/pdf/1911.05189v1.pdf
PWC	https://paperswithcode.com/paper/fast-glare-detection-in-document-images
Repo
Framework

On The Radon–Nikodym Spectral Approach With Optimal Clustering


Title	On The Radon–Nikodym Spectral Approach With Optimal Clustering
Authors	Vladislav Gennadievich Malyshkin
Abstract	Problems of interpolation, classification, and clustering are considered. In the tenets of Radon–Nikodym approach $\langle f(\mathbf{x})\psi^2 \rangle / \langle\psi^2\rangle$, where the $\psi(\mathbf{x})$ is a linear function on input attributes, all the answers are obtained from a generalized eigenproblem $f\psi^{[i]}\rangle = \lambda^{[i]} \psi^{[i]}\rangle$. The solution to the interpolation problem is a regular Radon-Nikodym derivative. The solution to the classification problem requires prior and posterior probabilities that are obtained using the Lebesgue quadrature[1] technique. Whereas in a Bayesian approach new observations change only outcome probabilities, in the Radon-Nikodym approach not only outcome probabilities but also the probability space $\psi^{[i]}\rangle$ change with new observations. This is a remarkable feature of the approach: both the probabilities and the probability space are constructed from the data. The Lebesgue quadrature technique can be also applied to the optimal clustering problem. The problem is solved by constructing a Gaussian quadrature on the Lebesgue measure. A distinguishing feature of the Radon-Nikodym approach is the knowledge of the invariant group: all the answers are invariant relatively any non-degenerated linear transform of input vector $\mathbf{x}$ components. A software product implementing the algorithms of interpolation, classification, and optimal clustering is available from the authors.
Tasks
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00460v9
PDF	https://arxiv.org/pdf/1906.00460v9.pdf
PWC	https://paperswithcode.com/paper/190600460
Repo
Framework

Exponential-Binary State-Space Search


Title	Exponential-Binary State-Space Search
Authors	Nathan Sturtevant, Malte Helmert
Abstract	Iterative deepening search is used in applications where the best cost bound for state-space search is unknown. The iterative deepening process is used to avoid overshooting the appropriate cost bound and doing too much work as a result. However, iterative deepening search also does too much work if the cost bound grows too slowly. This paper proposes a new framework for iterative deepening search called exponential-binary state-space search. The approach interleaves exponential and binary searches to find the desired cost bound, reducing the worst-case overhead from polynomial to logarithmic. Exponential-binary search can be used with bounded depth-first search to improve the worst-case performance of IDA* and with breadth-first heuristic search to improve the worst-case performance of search with inconsistent heuristics.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02912v1
PDF	https://arxiv.org/pdf/1906.02912v1.pdf
PWC	https://paperswithcode.com/paper/exponential-binary-state-space-search
Repo
Framework

A machine vision meta-algorithm for automated recognition of underwater objects using sidescan sonar imagery


Title	A machine vision meta-algorithm for automated recognition of underwater objects using sidescan sonar imagery
Authors	Guillaume Labbe-Morissette, Sylvain Gauthier
Abstract	This paper details a new method to recognize and detect underwater objects in real-time sidescan sonar data imagery streams, with case-studies of applications for underwater archeology, and ghost fishing gear retrieval. We first synthesize images from sidescan data, apply geometric and radiometric corrections, then use 2D feature detection algorithms to identify point clouds of descriptive visual microfeatures such as corners and edges in the sonar images. We then apply a clustering algorithm on the feature point clouds to group feature sets into regions of interest, reject false positives, yielding a georeferenced inventory of objects.
Tasks
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07763v1
PDF	https://arxiv.org/pdf/1909.07763v1.pdf
PWC	https://paperswithcode.com/paper/a-machine-vision-meta-algorithm-for-automated
Repo
Framework

Predicting Novel Views Using Generative Adversarial Query Network


Title	Predicting Novel Views Using Generative Adversarial Query Network
Authors	Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila
Abstract	The problem of predicting a novel view of the scene using an arbitrary number of observations is a challenging problem for computers as well as for humans. This paper introduces the Generative Adversarial Query Network (GAQN), a general learning framework for novel view synthesis that combines Generative Query Network (GQN) and Generative Adversarial Networks (GANs). The conventional GQN encodes input views into a latent representation that is used to generate a new view through a recurrent variational decoder. The proposed GAQN builds on this work by adding two novel aspects: First, we extend the current GQN architecture with an adversarial loss function for improving the visual quality and convergence speed. Second, we introduce a feature-matching loss function for stabilizing the training procedure. The experiments demonstrate that GAQN is able to produce high-quality results and faster convergence compared to the conventional approach.
Tasks	Novel View Synthesis
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05124v1
PDF	http://arxiv.org/pdf/1904.05124v1.pdf
PWC	https://paperswithcode.com/paper/predicting-novel-views-using-generative
Repo
Framework

Multiview Hessian Regularization for Image Annotation


Title	Multiview Hessian Regularization for Image Annotation
Authors	Weifeng Liu, Dacheng Tao
Abstract	The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semi-supervised learning (SSL) has consequently received intensive attention in recent years and has been successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it has been observed that LR biases the classification function towards a constant function which possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple Hessian regularizations, each of which is obtained from a particular view of instances, and steers the classification function which varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC’07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.
Tasks
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10100v1
PDF	http://arxiv.org/pdf/1904.10100v1.pdf
PWC	https://paperswithcode.com/paper/multiview-hessian-regularization-for-image
Repo
Framework

Learning Ising Models with Independent Failures


Title	Learning Ising Models with Independent Failures
Authors	Surbhi Goel, Daniel M. Kane, Adam R. Klivans
Abstract	We give the first efficient algorithm for learning the structure of an Ising model that tolerates independent failures; that is, each entry of the observed sample is missing with some unknown probability p. Our algorithm matches the essentially optimal runtime and sample complexity bounds of recent work for learning Ising models due to Klivans and Meka (2017). We devise a novel unbiased estimator for the gradient of the Interaction Screening Objective (ISO) due to Vuffray et al. (2016) and apply a stochastic multiplicative gradient descent algorithm to minimize this objective. Solutions to this minimization recover the neighborhood information of the underlying Ising model on a node by node basis.
Tasks
Published	2019-02-13
URL	http://arxiv.org/abs/1902.04728v1
PDF	http://arxiv.org/pdf/1902.04728v1.pdf
PWC	https://paperswithcode.com/paper/learning-ising-models-with-independent
Repo
Framework

Landmark-Based Approaches for Goal Recognition as Planning


Title	Landmark-Based Approaches for Goal Recognition as Planning
Authors	Ramon Fraga Pereira, Nir Oren, Felipe Meneguzzi
Abstract	The task of recognizing goals and plans from missing and full observations can be done efficiently by using automated planning techniques. In many applications, it is important to recognize goals and plans not only accurately, but also quickly. To address this challenge, we develop novel goal recognition approaches based on planning techniques that rely on planning landmarks. In automated planning, landmarks are properties (or actions) that cannot be avoided to achieve a goal. We show the applicability of a number of planning techniques with an emphasis on landmarks for goal and plan recognition tasks in two settings: (1) we use the concept of landmarks to develop goal recognition heuristics; and (2) we develop a landmark-based filtering method to refine existing planning-based goal and plan recognition approaches. These recognition approaches are empirically evaluated in experiments over several classical planning domains. We show that our goal recognition approaches yield not only accuracy comparable to (and often higher than) other state-of-the-art techniques, but also substantially faster recognition time over such techniques.
Tasks
Published	2019-04-26
URL	https://arxiv.org/abs/1904.11739v2
PDF	https://arxiv.org/pdf/1904.11739v2.pdf
PWC	https://paperswithcode.com/paper/landmark-based-approaches-for-goal
Repo
Framework

Voice Mimicry Attacks Assisted by Automatic Speaker Verification


Title	Voice Mimicry Attacks Assisted by Automatic Speaker Verification
Authors	Ville Vestman, Tomi Kinnunen, Rosa González Hautamäki, Md Sahidullah
Abstract	In this work, we simulate a scenario, where a publicly available ASV system is used to enhance mimicry attacks against another closed source ASV system. In specific, ASV technology is used to perform a similarity search between the voices of recruited attackers (6) and potential target speakers (7,365) from VoxCeleb corpora to find the closest targets for each of the attackers. In addition, we consider ‘median’, ‘furthest’, and ‘common’ targets to serve as a reference points. Our goal is to gain insights how well similarity rankings transfer from the attacker’s ASV system to the attacked ASV system, whether the attackers are able to improve their attacks by mimicking, and how the properties of the voices of attackers change due to mimicking. We address these questions through ASV experiments, listening tests, and prosodic and formant analyses. For the ASV experiments, we use i-vector technology in the attacker side, and x-vectors in the attacked side. For the listening tests, we recruit listeners through crowdsourcing. The results of the ASV experiments indicate that the speaker similarity scores transfer well from one ASV system to another. Both the ASV experiments and the listening tests reveal that the mimicry attempts do not, in general, help in bringing attacker’s scores closer to the target’s. A detailed analysis shows that mimicking does not improve attacks, when the natural voices of attackers and targets are similar to each other. The analysis of prosody and formants suggests that the attackers were able to considerably change their speaking rates when mimicking, but the changes in F0 and formants were modest. Overall, the results suggest that untrained impersonators do not pose a high threat towards ASV systems, but the use of ASV systems to attack other ASV systems is a potential threat.
Tasks	Speaker Verification
Published	2019-06-03
URL	https://arxiv.org/abs/1906.01454v2
PDF	https://arxiv.org/pdf/1906.01454v2.pdf
PWC	https://paperswithcode.com/paper/voice-mimicry-attacks-assisted-by-automatic
Repo
Framework

Towards Unsupervised Image Captioning with Shared Multimodal Embeddings


Title	Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Authors	Iro Laina, Christian Rupprecht, Nassir Navab
Abstract	Understanding images without explicit supervision has become an important problem in computer vision. In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions. The core component of our approach is a shared latent space that is structured by visual concepts. In this space, the two modalities should be indistinguishable. A language model is first trained to encode sentences into semantically structured embeddings. Image features that are translated into this embedding space can be decoded into descriptions through the same language model, similarly to sentence embeddings. This translation is learned from weakly paired images and text using a loss robust to noisy assignments and a conditional adversarial component. Our approach allows to exploit large text corpora outside the annotated distributions of image/caption data. Our experiments show that the proposed domain alignment learns a semantically meaningful representation which outperforms previous work.
Tasks	Image Captioning, Language Modelling, Sentence Embeddings
Published	2019-08-25
URL	https://arxiv.org/abs/1908.09317v1
PDF	https://arxiv.org/pdf/1908.09317v1.pdf
PWC	https://paperswithcode.com/paper/towards-unsupervised-image-captioning-with
Repo
Framework

Characterizing Collective Attention via Descriptor Context: A Case Study of Public Discussions of Crisis Events


Title	Characterizing Collective Attention via Descriptor Context: A Case Study of Public Discussions of Crisis Events
Authors	Ian Stewart, Diyi Yang, Jacob Eisenstein
Abstract	Social media datasets make it possible to rapidly quantify collective attention to emerging topics and breaking news, such as crisis events. Collective attention is typically measured by aggregate counts, such as the number of posts that mention a name or hashtag. But according to rationalist models of natural language communication, the collective salience of each entity will be expressed not only in how often it is mentioned, but in the form that those mentions take. This is because natural language communication is premised on (and customized to) the expectations that speakers and writers have about how their messages will be interpreted by the intended audience. We test this idea by conducting a large-scale analysis of public online discussions of breaking news events on Facebook and Twitter, focusing on five recent crisis events. We examine how people refer to locations, focusing specifically on contextual descriptors, such as “San Juan” versus “San Juan, Puerto Rico.” Rationalist accounts of natural language communication predict that such descriptors will be unnecessary (and therefore omitted) when the named entity is expected to have high prior salience to the reader. We find that the use of contextual descriptors is indeed associated with proxies for social and informational expectations, including macro-level factors like the location’s global salience and micro-level factors like audience engagement. We also find a consistent decrease in descriptor context use over the lifespan of each crisis event. These findings provide evidence about how social media users communicate with their audiences, and point towards more fine-grained models of collective attention that may help researchers and crisis response organizations to better understand public perception of unfolding crisis events.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08784v3
PDF	https://arxiv.org/pdf/1909.08784v3.pdf
PWC	https://paperswithcode.com/paper/characterizing-collective-attention-via
Repo
Framework

End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning


Title	End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
Authors	Pavel Denisov, Ngoc Thang Vu
Abstract	This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech. We propose to train an end-to-end system conditioned on speaker embeddings and further improved by transfer learning from clean speech. This proposed framework does not require any parallel non-overlapped speech materials and is independent of the number of speakers. Our experimental results on overlapped speech datasets show that joint conditioning on speaker embeddings and transfer learning significantly improves the ASR performance.
Tasks	Speech Recognition, Transfer Learning
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04737v1
PDF	https://arxiv.org/pdf/1908.04737v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-multi-speaker-speech-recognition
Repo
Framework