Paper Group ANR 148
SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation. Transport Model for Feature Extraction. Exploring Representativeness and Informativeness for Active Learning. Fast Glare Detection in Document Images. On The Radon–Nikodym Spectral Approach With Optimal Clustering. Exponential-Binary …
SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation
Title | SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation |
Authors | René Schuster, Oliver Wasenmüller, Christian Unger, Georg Kuschk, Didier Stricker |
Abstract | State-of-the-art scene flow algorithms pursue the conflicting targets of accuracy, run time, and robustness. With the successful concept of pixel-wise matching and sparse-to-dense interpolation, we push the limits of scene flow estimation. Avoiding strong assumptions on the domain or the problem yields a more robust algorithm. This algorithm is fast because we avoid explicit regularization during matching, which allows an efficient computation. Using image information from multiple time steps and explicit visibility prediction based on previous results, we achieve competitive performances on different data sets. Our contributions and results are evaluated in comparative experiments. Overall, we present an accurate scene flow algorithm that is faster and more generic than any individual benchmark leader. |
Tasks | Scene Flow Estimation |
Published | 2019-02-26 |
URL | https://arxiv.org/abs/1902.10099v2 |
https://arxiv.org/pdf/1902.10099v2.pdf | |
PWC | https://paperswithcode.com/paper/sceneflowfields-multi-frame-matching |
Repo | |
Framework | |
Transport Model for Feature Extraction
Title | Transport Model for Feature Extraction |
Authors | Wojciech Czaja, Dong Dong, Pierre-Emmanuel Jabin, Franck Olivier Ndjakou Njeunje |
Abstract | We present a new feature extraction method for complex and large datasets, based on the concept of transport operators on graphs. The proposed approach generalizes and extends the many existing data representation methodologies built upon diffusion processes, to a new domain where dynamical systems play a key role. The main advantage of this approach comes from the ability to exploit different relationships than those arising in the context of e.g., Graph Laplacians. Fundamental properties of the transport operators are proved. We demonstrate the flexibility of the method by introducing several diverse examples of transformations. We close the paper with a series of computational experiments and applications to the problem of classification of hyperspectral satellite imagery, to illustrate the practical implications of our algorithm and its ability to quantify new aspects of relationships within complicated datasets. |
Tasks | |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14543v1 |
https://arxiv.org/pdf/1910.14543v1.pdf | |
PWC | https://paperswithcode.com/paper/transport-model-for-feature-extraction |
Repo | |
Framework | |
Exploring Representativeness and Informativeness for Active Learning
Title | Exploring Representativeness and Informativeness for Active Learning |
Authors | Bo Du, Zengmao Wang, Lefei Zhang, Liangpei Zhang, Wei Liu, Jialie Shen, Dacheng Tao |
Abstract | How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified Best-versus-Second-Best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms. |
Tasks | Active Learning |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06685v1 |
http://arxiv.org/pdf/1904.06685v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-representativeness-and |
Repo | |
Framework | |
Fast Glare Detection in Document Images
Title | Fast Glare Detection in Document Images |
Authors | Dmitry Rodin, Nikita Orlov |
Abstract | Glare is a phenomenon that occurs when the scene has a reflection of a light source or has one in it. This luminescence can hide useful information from the image, making text recognition virtually impossible. In this paper, we propose an approach to detect glare in images taken by users via mobile devices. Our method divides the document into blocks and collects luminance features from the original image and black-white strokes histograms of the binarized image. Finally, glare is detected using a convolutional neural network on the aforementioned histograms and luminance features. The network consists of several feature extraction blocks, one for each type of input, and the detection block, which calculates the resulting glare heatmap based on the output of the extraction part. The proposed solution detects glare with high recall and f-score. |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1911.05189v1 |
https://arxiv.org/pdf/1911.05189v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-glare-detection-in-document-images |
Repo | |
Framework | |
On The Radon–Nikodym Spectral Approach With Optimal Clustering
Title | On The Radon–Nikodym Spectral Approach With Optimal Clustering |
Authors | Vladislav Gennadievich Malyshkin |
Abstract | Problems of interpolation, classification, and clustering are considered. In the tenets of Radon–Nikodym approach $\langle f(\mathbf{x})\psi^2 \rangle / \langle\psi^2\rangle$, where the $\psi(\mathbf{x})$ is a linear function on input attributes, all the answers are obtained from a generalized eigenproblem $f\psi^{[i]}\rangle = \lambda^{[i]} \psi^{[i]}\rangle$. The solution to the interpolation problem is a regular Radon-Nikodym derivative. The solution to the classification problem requires prior and posterior probabilities that are obtained using the Lebesgue quadrature[1] technique. Whereas in a Bayesian approach new observations change only outcome probabilities, in the Radon-Nikodym approach not only outcome probabilities but also the probability space $\psi^{[i]}\rangle$ change with new observations. This is a remarkable feature of the approach: both the probabilities and the probability space are constructed from the data. The Lebesgue quadrature technique can be also applied to the optimal clustering problem. The problem is solved by constructing a Gaussian quadrature on the Lebesgue measure. A distinguishing feature of the Radon-Nikodym approach is the knowledge of the invariant group: all the answers are invariant relatively any non-degenerated linear transform of input vector $\mathbf{x}$ components. A software product implementing the algorithms of interpolation, classification, and optimal clustering is available from the authors. |
Tasks | |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00460v9 |
https://arxiv.org/pdf/1906.00460v9.pdf | |
PWC | https://paperswithcode.com/paper/190600460 |
Repo | |
Framework | |
Exponential-Binary State-Space Search
Title | Exponential-Binary State-Space Search |
Authors | Nathan Sturtevant, Malte Helmert |
Abstract | Iterative deepening search is used in applications where the best cost bound for state-space search is unknown. The iterative deepening process is used to avoid overshooting the appropriate cost bound and doing too much work as a result. However, iterative deepening search also does too much work if the cost bound grows too slowly. This paper proposes a new framework for iterative deepening search called exponential-binary state-space search. The approach interleaves exponential and binary searches to find the desired cost bound, reducing the worst-case overhead from polynomial to logarithmic. Exponential-binary search can be used with bounded depth-first search to improve the worst-case performance of IDA* and with breadth-first heuristic search to improve the worst-case performance of search with inconsistent heuristics. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02912v1 |
https://arxiv.org/pdf/1906.02912v1.pdf | |
PWC | https://paperswithcode.com/paper/exponential-binary-state-space-search |
Repo | |
Framework | |
A machine vision meta-algorithm for automated recognition of underwater objects using sidescan sonar imagery
Title | A machine vision meta-algorithm for automated recognition of underwater objects using sidescan sonar imagery |
Authors | Guillaume Labbe-Morissette, Sylvain Gauthier |
Abstract | This paper details a new method to recognize and detect underwater objects in real-time sidescan sonar data imagery streams, with case-studies of applications for underwater archeology, and ghost fishing gear retrieval. We first synthesize images from sidescan data, apply geometric and radiometric corrections, then use 2D feature detection algorithms to identify point clouds of descriptive visual microfeatures such as corners and edges in the sonar images. We then apply a clustering algorithm on the feature point clouds to group feature sets into regions of interest, reject false positives, yielding a georeferenced inventory of objects. |
Tasks | |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07763v1 |
https://arxiv.org/pdf/1909.07763v1.pdf | |
PWC | https://paperswithcode.com/paper/a-machine-vision-meta-algorithm-for-automated |
Repo | |
Framework | |
Predicting Novel Views Using Generative Adversarial Query Network
Title | Predicting Novel Views Using Generative Adversarial Query Network |
Authors | Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila |
Abstract | The problem of predicting a novel view of the scene using an arbitrary number of observations is a challenging problem for computers as well as for humans. This paper introduces the Generative Adversarial Query Network (GAQN), a general learning framework for novel view synthesis that combines Generative Query Network (GQN) and Generative Adversarial Networks (GANs). The conventional GQN encodes input views into a latent representation that is used to generate a new view through a recurrent variational decoder. The proposed GAQN builds on this work by adding two novel aspects: First, we extend the current GQN architecture with an adversarial loss function for improving the visual quality and convergence speed. Second, we introduce a feature-matching loss function for stabilizing the training procedure. The experiments demonstrate that GAQN is able to produce high-quality results and faster convergence compared to the conventional approach. |
Tasks | Novel View Synthesis |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05124v1 |
http://arxiv.org/pdf/1904.05124v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-novel-views-using-generative |
Repo | |
Framework | |
Multiview Hessian Regularization for Image Annotation
Title | Multiview Hessian Regularization for Image Annotation |
Authors | Weifeng Liu, Dacheng Tao |
Abstract | The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semi-supervised learning (SSL) has consequently received intensive attention in recent years and has been successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it has been observed that LR biases the classification function towards a constant function which possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple Hessian regularizations, each of which is obtained from a particular view of instances, and steers the classification function which varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC’07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR. |
Tasks | |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10100v1 |
http://arxiv.org/pdf/1904.10100v1.pdf | |
PWC | https://paperswithcode.com/paper/multiview-hessian-regularization-for-image |
Repo | |
Framework | |
Learning Ising Models with Independent Failures
Title | Learning Ising Models with Independent Failures |
Authors | Surbhi Goel, Daniel M. Kane, Adam R. Klivans |
Abstract | We give the first efficient algorithm for learning the structure of an Ising model that tolerates independent failures; that is, each entry of the observed sample is missing with some unknown probability p. Our algorithm matches the essentially optimal runtime and sample complexity bounds of recent work for learning Ising models due to Klivans and Meka (2017). We devise a novel unbiased estimator for the gradient of the Interaction Screening Objective (ISO) due to Vuffray et al. (2016) and apply a stochastic multiplicative gradient descent algorithm to minimize this objective. Solutions to this minimization recover the neighborhood information of the underlying Ising model on a node by node basis. |
Tasks | |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.04728v1 |
http://arxiv.org/pdf/1902.04728v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-ising-models-with-independent |
Repo | |
Framework | |
Landmark-Based Approaches for Goal Recognition as Planning
Title | Landmark-Based Approaches for Goal Recognition as Planning |
Authors | Ramon Fraga Pereira, Nir Oren, Felipe Meneguzzi |
Abstract | The task of recognizing goals and plans from missing and full observations can be done efficiently by using automated planning techniques. In many applications, it is important to recognize goals and plans not only accurately, but also quickly. To address this challenge, we develop novel goal recognition approaches based on planning techniques that rely on planning landmarks. In automated planning, landmarks are properties (or actions) that cannot be avoided to achieve a goal. We show the applicability of a number of planning techniques with an emphasis on landmarks for goal and plan recognition tasks in two settings: (1) we use the concept of landmarks to develop goal recognition heuristics; and (2) we develop a landmark-based filtering method to refine existing planning-based goal and plan recognition approaches. These recognition approaches are empirically evaluated in experiments over several classical planning domains. We show that our goal recognition approaches yield not only accuracy comparable to (and often higher than) other state-of-the-art techniques, but also substantially faster recognition time over such techniques. |
Tasks | |
Published | 2019-04-26 |
URL | https://arxiv.org/abs/1904.11739v2 |
https://arxiv.org/pdf/1904.11739v2.pdf | |
PWC | https://paperswithcode.com/paper/landmark-based-approaches-for-goal |
Repo | |
Framework | |
Voice Mimicry Attacks Assisted by Automatic Speaker Verification
Title | Voice Mimicry Attacks Assisted by Automatic Speaker Verification |
Authors | Ville Vestman, Tomi Kinnunen, Rosa González Hautamäki, Md Sahidullah |
Abstract | In this work, we simulate a scenario, where a publicly available ASV system is used to enhance mimicry attacks against another closed source ASV system. In specific, ASV technology is used to perform a similarity search between the voices of recruited attackers (6) and potential target speakers (7,365) from VoxCeleb corpora to find the closest targets for each of the attackers. In addition, we consider ‘median’, ‘furthest’, and ‘common’ targets to serve as a reference points. Our goal is to gain insights how well similarity rankings transfer from the attacker’s ASV system to the attacked ASV system, whether the attackers are able to improve their attacks by mimicking, and how the properties of the voices of attackers change due to mimicking. We address these questions through ASV experiments, listening tests, and prosodic and formant analyses. For the ASV experiments, we use i-vector technology in the attacker side, and x-vectors in the attacked side. For the listening tests, we recruit listeners through crowdsourcing. The results of the ASV experiments indicate that the speaker similarity scores transfer well from one ASV system to another. Both the ASV experiments and the listening tests reveal that the mimicry attempts do not, in general, help in bringing attacker’s scores closer to the target’s. A detailed analysis shows that mimicking does not improve attacks, when the natural voices of attackers and targets are similar to each other. The analysis of prosody and formants suggests that the attackers were able to considerably change their speaking rates when mimicking, but the changes in F0 and formants were modest. Overall, the results suggest that untrained impersonators do not pose a high threat towards ASV systems, but the use of ASV systems to attack other ASV systems is a potential threat. |
Tasks | Speaker Verification |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.01454v2 |
https://arxiv.org/pdf/1906.01454v2.pdf | |
PWC | https://paperswithcode.com/paper/voice-mimicry-attacks-assisted-by-automatic |
Repo | |
Framework | |
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Title | Towards Unsupervised Image Captioning with Shared Multimodal Embeddings |
Authors | Iro Laina, Christian Rupprecht, Nassir Navab |
Abstract | Understanding images without explicit supervision has become an important problem in computer vision. In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions. The core component of our approach is a shared latent space that is structured by visual concepts. In this space, the two modalities should be indistinguishable. A language model is first trained to encode sentences into semantically structured embeddings. Image features that are translated into this embedding space can be decoded into descriptions through the same language model, similarly to sentence embeddings. This translation is learned from weakly paired images and text using a loss robust to noisy assignments and a conditional adversarial component. Our approach allows to exploit large text corpora outside the annotated distributions of image/caption data. Our experiments show that the proposed domain alignment learns a semantically meaningful representation which outperforms previous work. |
Tasks | Image Captioning, Language Modelling, Sentence Embeddings |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09317v1 |
https://arxiv.org/pdf/1908.09317v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-unsupervised-image-captioning-with |
Repo | |
Framework | |
Characterizing Collective Attention via Descriptor Context: A Case Study of Public Discussions of Crisis Events
Title | Characterizing Collective Attention via Descriptor Context: A Case Study of Public Discussions of Crisis Events |
Authors | Ian Stewart, Diyi Yang, Jacob Eisenstein |
Abstract | Social media datasets make it possible to rapidly quantify collective attention to emerging topics and breaking news, such as crisis events. Collective attention is typically measured by aggregate counts, such as the number of posts that mention a name or hashtag. But according to rationalist models of natural language communication, the collective salience of each entity will be expressed not only in how often it is mentioned, but in the form that those mentions take. This is because natural language communication is premised on (and customized to) the expectations that speakers and writers have about how their messages will be interpreted by the intended audience. We test this idea by conducting a large-scale analysis of public online discussions of breaking news events on Facebook and Twitter, focusing on five recent crisis events. We examine how people refer to locations, focusing specifically on contextual descriptors, such as “San Juan” versus “San Juan, Puerto Rico.” Rationalist accounts of natural language communication predict that such descriptors will be unnecessary (and therefore omitted) when the named entity is expected to have high prior salience to the reader. We find that the use of contextual descriptors is indeed associated with proxies for social and informational expectations, including macro-level factors like the location’s global salience and micro-level factors like audience engagement. We also find a consistent decrease in descriptor context use over the lifespan of each crisis event. These findings provide evidence about how social media users communicate with their audiences, and point towards more fine-grained models of collective attention that may help researchers and crisis response organizations to better understand public perception of unfolding crisis events. |
Tasks | |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.08784v3 |
https://arxiv.org/pdf/1909.08784v3.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-collective-attention-via |
Repo | |
Framework | |
End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
Title | End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning |
Authors | Pavel Denisov, Ngoc Thang Vu |
Abstract | This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech. We propose to train an end-to-end system conditioned on speaker embeddings and further improved by transfer learning from clean speech. This proposed framework does not require any parallel non-overlapped speech materials and is independent of the number of speakers. Our experimental results on overlapped speech datasets show that joint conditioning on speaker embeddings and transfer learning significantly improves the ASR performance. |
Tasks | Speech Recognition, Transfer Learning |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04737v1 |
https://arxiv.org/pdf/1908.04737v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-multi-speaker-speech-recognition |
Repo | |
Framework | |