Paper Group ANR 1651
Rosetta: Large scale system for text detection and recognition in images. Asynchronous Single-Photon 3D Imaging. Zap Q-Learning for Optimal Stopping Time Problems. Fingerprint Recognition under Missing Image Pixels Scenario. Privacy preserving Neural Network Inference on Encrypted Data with GPUs. Tackling Initial Centroid of K-Means with Distance P …
Rosetta: Large scale system for text detection and recognition in images
Title | Rosetta: Large scale system for text detection and recognition in images |
Authors | Fedor Borisyuk, Albert Gordo, Viswanath Sivakumar |
Abstract | In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta, designed to process images uploaded daily at Facebook scale. Sharing of image content has become one of the primary ways to communicate information among internet users within social networks such as Facebook and Instagram, and the understanding of such media, including its textual information, is of paramount importance to facilitate search and recommendation applications. We present modeling techniques for efficient detection and recognition of text in images and describe Rosetta’s system architecture. We perform extensive evaluation of presented technologies, explain useful practical approaches to build an OCR system at scale, and provide insightful intuitions as to why and how certain components work based on the lessons learnt during the development and deployment of the system. |
Tasks | Optical Character Recognition |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.05085v1 |
https://arxiv.org/pdf/1910.05085v1.pdf | |
PWC | https://paperswithcode.com/paper/rosetta-large-scale-system-for-text-detection |
Repo | |
Framework | |
Asynchronous Single-Photon 3D Imaging
Title | Asynchronous Single-Photon 3D Imaging |
Authors | Anant Gupta, Atul Ingle, Mohit Gupta |
Abstract | Single-photon avalanche diodes (SPADs) are becoming popular in time-of-flight depth-ranging due to their unique ability to capture individual photons with picosecond timing resolution. However, ambient light (e.g., sunlight) incident on a SPAD-based 3D camera leads to severe non-linear distortions (pileup) in the measured waveform, resulting in large depth errors. We propose asynchronous single-photon 3D imaging, a family of acquisition schemes to mitigate pileup during data acquisition itself. Asynchronous acquisition temporally misaligns SPAD measurement windows and the laser cycles through deterministically predefined or randomized offsets. Our key insight is that pileup distortions can be “averaged out” by choosing a sequence of offsets that span the entire depth range. We develop a generalized image formation model and perform theoretical analysis to explore the space of asynchronous acquisition schemes and design high-performance schemes. Our simulations and experiments demonstrate an improvement in depth accuracy of up to an order of magnitude as compared to the state-of-the-art, across a wide range of imaging scenarios, including those with high ambient flux. |
Tasks | |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06372v1 |
https://arxiv.org/pdf/1908.06372v1.pdf | |
PWC | https://paperswithcode.com/paper/asynchronous-single-photon-3d-imaging |
Repo | |
Framework | |
Zap Q-Learning for Optimal Stopping Time Problems
Title | Zap Q-Learning for Optimal Stopping Time Problems |
Authors | Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean P. Meyn |
Abstract | The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of $\mathbb{R}^n$. We build on the dynamic programming approach taken by Tsitsikilis and Van Roy, wherein they propose a Q-learning algorithm to estimate the optimal state-action value function, which then defines an optimal stopping rule. We provide insights as to why the convergence rate of this algorithm can be slow, and propose a fast-converging alternative, the “Zap-Q-learning” algorithm, designed to achieve optimal rate of convergence. For the first time, we prove the convergence of the Zap-Q-learning algorithm under the assumption of linear function approximation setting. We use ODE analysis for the proof, and the optimal asymptotic variance property of the algorithm is reflected via fast convergence in a finance example. |
Tasks | Q-Learning |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11538v3 |
https://arxiv.org/pdf/1904.11538v3.pdf | |
PWC | https://paperswithcode.com/paper/zapq-learning-for-optimal-stopping-time |
Repo | |
Framework | |
Fingerprint Recognition under Missing Image Pixels Scenario
Title | Fingerprint Recognition under Missing Image Pixels Scenario |
Authors | Dejan Brajovic, Kristina Tomovic, Jovan Radonjic |
Abstract | This work observed the problem of fingerprint image recognition in the case of missing pixels from the original image. The possibility of missing pixels recovery is tested by applying the Compressive Sensing approach. Namely, different percentage of missing pixels is observed and the image reconstruction is done by applying commonly used approach for sparse image reconstruction. The theory is verified by experiments, showing successful image reconstruction and later person identification even if less then 90% of the image pixels is missing. |
Tasks | Compressive Sensing, Image Reconstruction, Person Identification |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.05389v1 |
http://arxiv.org/pdf/1902.05389v1.pdf | |
PWC | https://paperswithcode.com/paper/fingerprint-recognition-under-missing-image |
Repo | |
Framework | |
Privacy preserving Neural Network Inference on Encrypted Data with GPUs
Title | Privacy preserving Neural Network Inference on Encrypted Data with GPUs |
Authors | Daniel Takabi, Robert Podschwadt, Jeff Druce, Curt Wu, Kevin Procopio |
Abstract | Machine Learning as a Service (MLaaS) has become a growing trend in recent years and several such services are currently offered. MLaaS is essentially a set of services that provides machine learning tools and capabilities as part of cloud computing services. In these settings, the cloud has pre-trained models that are deployed and large computing capacity whereas the clients can use these models to make predictions without having to worry about maintaining the models and the service. However, the main concern with MLaaS is the privacy of the client’s data. Although there have been several proposed approaches in the literature to run machine learning models on encrypted data, the performance is still far from being satisfactory for practical use. In this paper, we aim to accelerate the performance of running machine learning on encrypted data using combination of Fully Homomorphic Encryption (FHE), Convolutional Neural Networks (CNNs) and Graphics Processing Units (GPUs). We use a number of optimization techniques, and efficient GPU-based implementation to achieve high performance. We evaluate a CNN whose architecture is similar to AlexNet to classify homomorphically encrypted samples from the Cars Overhead With Context (COWC) dataset. To the best of our knowledge, it is the first time such a complex network and large dataset is evaluated on encrypted data. Our approach achieved reasonable classification accuracy of 95% for the COWC dataset. In terms of performance, our results show that we could achieve several thousands times speed up when we implement GPU-accelerated FHE operations on encrypted floating point numbers. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11377v1 |
https://arxiv.org/pdf/1911.11377v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-neural-network-inference |
Repo | |
Framework | |
Tackling Initial Centroid of K-Means with Distance Part (DP-KMeans)
Title | Tackling Initial Centroid of K-Means with Distance Part (DP-KMeans) |
Authors | Ahmad Ilham, Danny Ibrahim, Luqman Assaffat, Achmad Solichan |
Abstract | The initial centroid is a fairly challenging problem in the k-means method because it can affect the clustering results. In addition, choosing the starting centroid of the cluster is not always appropriate, especially, when the number of groups increases. |
Tasks | |
Published | 2019-03-15 |
URL | http://arxiv.org/abs/1903.07977v1 |
http://arxiv.org/pdf/1903.07977v1.pdf | |
PWC | https://paperswithcode.com/paper/tackling-initial-centroid-of-k-means-with |
Repo | |
Framework | |
A Novel Fuzzy Search Approach over Encrypted Data with Improved Accuracy and Efficiency
Title | A Novel Fuzzy Search Approach over Encrypted Data with Improved Accuracy and Efficiency |
Authors | Jinkun Cao, Jinhao Zhu, Liwei Lin, Zhengui Xue, Ruhui Ma, Haibing Guan |
Abstract | As cloud computing becomes prevalent in recent years, more and more enterprises and individuals outsource their data to cloud servers. To avoid privacy leaks, outsourced data usually is encrypted before being sent to cloud servers, which disables traditional search schemes for plain text. To meet both end of security and searchability, search-supported encryption is proposed. However, many previous schemes suffer severe vulnerability when typos and semantic diversity exist in query requests. To overcome such flaw, higher error-tolerance is always expected for search-supported encryption design, sometimes defined as ‘fuzzy search’. In this paper, we propose a new scheme of multi-keyword fuzzy search over encrypted and outsourced data. Our approach introduces a new mechanism to map a natural language expression into a word-vector space. Compared with previous approaches, our design shows higher robustness when multiple kinds of typos are involved. Besides, our approach is enhanced with novel data structures to improve search efficiency. These two innovations can work well for both accuracy and efficiency. Moreover, these designs will not hurt the fundamental security. Experiments on a real-world dataset demonstrate the effectiveness of our proposed approach, which outperforms currently popular approaches focusing on similar tasks. |
Tasks | |
Published | 2019-04-27 |
URL | https://arxiv.org/abs/1904.12111v2 |
https://arxiv.org/pdf/1904.12111v2.pdf | |
PWC | https://paperswithcode.com/paper/better-accuracy-and-efficiency-for-fuzzy |
Repo | |
Framework | |
Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs
Title | Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs |
Authors | Yuxian Meng, Xiangyuan Ren, Zijun Sun, Xiaoya Li, Arianna Yuan, Fei Wu, Jiwei Li |
Abstract | In this paper, we investigate the problem of training neural machine translation (NMT) systems with a dataset of more than 40 billion bilingual sentence pairs, which is larger than the largest dataset to date by orders of magnitude. Unprecedented challenges emerge in this situation compared to previous NMT work, including severe noise in the data and prohibitively long training time. We propose practical solutions to handle these issues and demonstrate that large-scale pretraining significantly improves NMT performance. We are able to push the BLEU score of WMT17 Chinese-English dataset to 32.3, with a significant performance boost of +3.2 over existing state-of-the-art results. |
Tasks | Machine Translation |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11861v3 |
https://arxiv.org/pdf/1909.11861v3.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-pretraining-for-neural-machine |
Repo | |
Framework | |
ALOHA: Artificial Learning of Human Attributes for Dialogue Agents
Title | ALOHA: Artificial Learning of Human Attributes for Dialogue Agents |
Authors | Aaron W. Li, Veronica Jiang, Steven Y. Feng, Julia Sprague, Wei Zhou, Jesse Hoey |
Abstract | For conversational AI and virtual assistants to communicate with humans in a realistic way, they must exhibit human characteristics such as expression of emotion and personality. Current attempts toward constructing human-like dialogue agents have presented significant difficulties. We propose Human Level Attributes (HLAs) based on tropes as the basis of a method for learning dialogue agents that can imitate the personalities of fictional characters. Tropes are characteristics of fictional personalities that are observed recurrently and determined by viewers’ impressions. By combining detailed HLA data with dialogue data for specific characters, we present a dataset, HLA-Chat, that models character profiles and gives dialogue agents the ability to learn characters’ language styles through their HLAs. We then introduce a three-component system, ALOHA (which stands for Artificial Learning of Human Attributes), that combines character space mapping, character community detection, and language style retrieval to build a character (or personality) specific language model. Our preliminary experiments demonstrate that two variations of ALOHA, combined with our proposed dataset, can outperform baseline models at identifying the correct dialogue responses of chosen target characters, and are stable regardless of the character’s identity, the genre of the show, and the context of the dialogue. |
Tasks | Community Detection, Language Modelling |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08293v2 |
https://arxiv.org/pdf/1910.08293v2.pdf | |
PWC | https://paperswithcode.com/paper/follow-alice-into-the-rabbit-hole-giving |
Repo | |
Framework | |
Relational Reasoning using Prior Knowledge for Visual Captioning
Title | Relational Reasoning using Prior Knowledge for Visual Captioning |
Authors | Jingyi Hou, Xinxiao Wu, Yayun Qi, Wentian Zhao, Jiebo Luo, Yunde Jia |
Abstract | Exploiting relationships among objects has achieved remarkable progress in interpreting images or videos by natural language. Most existing methods resort to first detecting objects and their relationships, and then generating textual descriptions, which heavily depends on pre-trained detectors and leads to performance drop when facing problems of heavy occlusion, tiny-size objects and long-tail in object detection. In addition, the separate procedure of detecting and captioning results in semantic inconsistency between the pre-defined object/relation categories and the target lexical words. We exploit prior human commonsense knowledge for reasoning relationships between objects without any pre-trained detectors and reaching semantic coherency within one image or video in captioning. The prior knowledge (e.g., in the form of knowledge graph) provides commonsense semantic correlation and constraint between objects that are not explicit in the image and video, serving as useful guidance to build semantic graph for sentence generation. Particularly, we present a joint reasoning method that incorporates 1) commonsense reasoning for embedding image or video regions into semantic space to build semantic graph and 2) relational reasoning for encoding semantic graph to generate sentences. Extensive experiments on the MS-COCO image captioning benchmark and the MSVD video captioning benchmark validate the superiority of our method on leveraging prior commonsense knowledge to enhance relational reasoning for visual captioning. |
Tasks | Image Captioning, Object Detection, Relational Reasoning, Video Captioning |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01290v1 |
https://arxiv.org/pdf/1906.01290v1.pdf | |
PWC | https://paperswithcode.com/paper/relational-reasoning-using-prior-knowledge |
Repo | |
Framework | |
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Title | Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning |
Authors | Wei Zhang, Bairui Wang, Lin Ma, Wei Liu |
Abstract | In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) in a novel encoder-decoder-reconstructor architecture, which leverages both forward (video to sentence) and backward (sentence to video) flows for video captioning. Specifically, the encoder-decoder component makes use of the forward flow to produce a sentence description based on the encoded video semantic features. Two types of reconstructors are subsequently proposed to employ the backward flow and reproduce the video features from local and global perspectives, respectively, capitalizing on the hidden state sequence generated by the decoder. Moreover, in order to make a comprehensive reconstruction of the video features, we propose to fuse the two types of reconstructors together. The generation loss yielded by the encoder-decoder component and the reconstruction loss introduced by the reconstructor are jointly cast into training the proposed RecNet in an end-to-end fashion. Furthermore, the RecNet is fine-tuned by CIDEr optimization via reinforcement learning, which significantly boosts the captioning performance. Experimental results on benchmark datasets demonstrate that the proposed reconstructor can boost the performance of video captioning consistently. |
Tasks | Video Captioning |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.01452v1 |
https://arxiv.org/pdf/1906.01452v1.pdf | |
PWC | https://paperswithcode.com/paper/reconstruct-and-represent-video-contents-for |
Repo | |
Framework | |
Testing and verification of neural-network-based safety-critical control software: A systematic literature review
Title | Testing and verification of neural-network-based safety-critical control software: A systematic literature review |
Authors | Jin Zhang, Jingyue Li |
Abstract | Context: Neural Network (NN) algorithms have been successfully adopted in a number of Safety-Critical Cyber-Physical Systems (SCCPSs). Testing and Verification (T&V) of NN-based control software in safety-critical domains are gaining interest and attention from both software engineering and safety engineering researchers and practitioners. Objective: With the increase in studies on the T&V of NN-based control software in safety-critical domains, it is important to systematically review the state-of-the-art T&V methodologies, to classify approaches and tools that are invented, and to identify challenges and gaps for future studies. Method: We retrieved 950 papers on the T&V of NN-based Safety-Critical Control Software (SCCS). To reach our result, we filtered 83 primary papers published between 2001 and 2018, applied the thematic analysis approach for analyzing the data extracted from the selected papers, presented the classification of approaches, and identified challenges. Conclusion: The approaches were categorized into five high-order themes: assuring robustness of NNs, assuring safety properties of NN-based control software, improving the failure resilience of NNs, measuring and ensuring test completeness, and improving the interpretability of NNs. From the industry perspective, improving the interpretability of NNs is a crucial need in safety-critical applications. We also investigated nine safety integrity properties within four major safety lifecycle phases to investigate the achievement level of T&V goals in IEC 61508-3. Results show that correctness, completeness, freedom from intrinsic faults, and fault tolerance have drawn most attention from the research community. However, little effort has been invested in achieving repeatability; no reviewed study focused on precisely defined testing configuration or on defense against common cause failure. |
Tasks | |
Published | 2019-10-05 |
URL | https://arxiv.org/abs/1910.06715v2 |
https://arxiv.org/pdf/1910.06715v2.pdf | |
PWC | https://paperswithcode.com/paper/testing-and-verification-of-neural-network |
Repo | |
Framework | |
A Conditional Random Field Model for Context Aware Cloud Detection in Sky Images
Title | A Conditional Random Field Model for Context Aware Cloud Detection in Sky Images |
Authors | Vijai T. Jayadevan, Jeffrey J. Rodriguez, Alexander D. Cronin |
Abstract | A conditional random field (CRF) model for cloud detection in ground based sky images is presented. We show that very high cloud detection accuracy can be achieved by combining a discriminative classifier and a higher order clique potential in a CRF framework. The image is first divided into homogeneous regions using a mean shift clustering algorithm and then a CRF model is defined over these regions. The various parameters involved are estimated using training data and the inference is performed using Iterated Conditional Modes (ICM) algorithm. We demonstrate how taking spatial context into account can boost the accuracy. We present qualitative and quantitative results to prove the superior performance of this framework in comparison with other state of the art methods applied for cloud detection. |
Tasks | Cloud Detection |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.07383v1 |
https://arxiv.org/pdf/1906.07383v1.pdf | |
PWC | https://paperswithcode.com/paper/a-conditional-random-field-model-for-context |
Repo | |
Framework | |
Efficient Second-Order Shape-Constrained Function Fitting
Title | Efficient Second-Order Shape-Constrained Function Fitting |
Authors | David Durfee, Yu Gao, Anup B. Rao, Sebastian Wild |
Abstract | We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted-$L_{\infty}$ norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error $\varepsilon$ in $O\left(n \log \frac{U}{\varepsilon} \right)$ time, where $U$ captures the range of input values. We also give a simple greedy algorithm that runs in $O(n)$ time for the special case of unweighted $L_{\infty}$ convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming problem. We further show that a generalization of the corresponding problems to directed acyclic graphs (DAGs) is as difficult as linear programming. |
Tasks | |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.02149v2 |
https://arxiv.org/pdf/1905.02149v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-second-order-shape-constrained |
Repo | |
Framework | |
Policy Evaluation with Latent Confounders via Optimal Balance
Title | Policy Evaluation with Latent Confounders via Optimal Balance |
Authors | Andrew Bennett, Nathan Kallus |
Abstract | Evaluating novel contextual bandit policies using logged data is crucial in applications where exploration is costly, such as medicine. But it usually relies on the assumption of no unobserved confounders, which is bound to fail in practice. We study the question of policy evaluation when we instead have proxies for the latent confounders and develop an importance weighting method that avoids fitting a latent outcome regression model. We show that unlike the unconfounded case no single set of weights can give unbiased evaluation for all outcome models, yet we propose a new algorithm that can still provably guarantee consistency by instead minimizing an adversarial balance objective. We further develop tractable algorithms for optimizing this objective and demonstrate empirically the power of our method when confounders are latent. |
Tasks | |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.01920v1 |
https://arxiv.org/pdf/1908.01920v1.pdf | |
PWC | https://paperswithcode.com/paper/policy-evaluation-with-latent-confounders-via |
Repo | |
Framework | |