Paper Group ANR 1762
Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network. Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings. Tackling Graphical NLP problems with Graph Recurrent Networks. M$^2$VAE - Derivation of a Multi-Modal Variationa …
Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network
Title | Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network |
Authors | Hazrat Ali, Ahsan Ullah, Talha Iqbal, Shahid Khattak |
Abstract | Automatic recognition of Urdu handwritten digits and characters, is a challenging task. It has applications in postal address reading, bank’s cheque processing, and digitization and preservation of handwritten manuscripts from old ages. While there exists a significant work for automatic recognition of handwritten English characters and other major languages of the world, the work done for Urdu lan-guage is extremely insufficient. This paper has two goals. Firstly, we introduce a pioneer dataset for handwritten digits and characters of Urdu, containing samples from more than 900 individuals. Secondly, we report results for automatic recog-nition of handwritten digits and characters as achieved by using deep auto-encoder network and convolutional neural network. More specifically, we use a two-layer and a three-layer deep autoencoder network and convolutional neural network and evaluate the two frameworks in terms of recognition accuracy. The proposed framework of deep autoencoder can successfully recognize digits and characters with an accuracy of 97% for digits only, 81% for characters only and 82% for both digits and characters simultaneously. In comparison, the framework of convolutional neural network has accuracy of 96.7% for digits only, 86.5% for characters only and 82.7% for both digits and characters simultaneously. These frameworks can serve as baselines for future research on Urdu handwritten text. |
Tasks | |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07943v1 |
https://arxiv.org/pdf/1912.07943v1.pdf | |
PWC | https://paperswithcode.com/paper/pioneer-dataset-and-automatic-recognition-of |
Repo | |
Framework | |
Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings
Title | Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings |
Authors | Pankaj Gupta, Yatin Chaudhary, Hinrich Schütze |
Abstract | Though word embeddings and topics are complementary representations, several past works have only used pre-trained word embeddings in (neural) topic modeling to address data sparsity problem in short text or small collection of documents. However, no prior work has employed (pre-trained latent) topics in transfer learning paradigm. In this paper, we propose an approach to (1) perform knowledge transfer using latent topics obtained from a large source corpus, and (2) jointly transfer knowledge via the two representations (or views) in neural topic modeling to improve topic quality, better deal with polysemy and data sparsity issues in a target corpus. In doing so, we first accumulate topics and word representations from one or many source corpora to build a pool of topics and word vectors. Then, we identify one or multiple relevant source domain(s) and take advantage of corresponding topics and word features via the respective pools to guide meaningful learning in the sparse target domain. We quantify the quality of topic and document representations via generalization (perplexity), interpretability (topic coherence) and information retrieval (IR) using short-text, long-text, small and large document collections from news and medical domains. We have demonstrated the state-of-the-art results on topic modeling with the proposed framework. |
Tasks | Information Retrieval, Transfer Learning, Word Embeddings |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06563v2 |
https://arxiv.org/pdf/1909.06563v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-and-multi-source-transfers-in |
Repo | |
Framework | |
Tackling Graphical NLP problems with Graph Recurrent Networks
Title | Tackling Graphical NLP problems with Graph Recurrent Networks |
Authors | Linfeng Song |
Abstract | How to properly model graphs is a long-existing and important problem in NLP area, where several popular types of graphs are knowledge graphs, semantic graphs and dependency graphs. Comparing with other data structures, such as sequences and trees, graphs are generally more powerful in representing complex correlations among entities. For example, a knowledge graph stores real-word entities (such as “Barack_Obama” and “U.S.") and their relations (such as “live_in” and “lead_by”). Properly encoding a knowledge graph is beneficial to user applications, such as question answering and knowledge discovery. Modeling graphs is also very challenging, probably because graphs usually contain massive and cyclic relations. Recent years have witnessed the success of deep learning, especially RNN-based models, on many NLP problems. Besides, RNNs and their variations have been extensively studied on several graph problems and showed preliminary successes. Despite the successes that have been achieved, RNN-based models suffer from several major drawbacks on graphs. First, they can only consume sequential data, thus linearization is required to serialize input graphs, resulting in the loss of important structural information. Second, the serialization results are usually very long, so it takes a long time for RNNs to encode them. In this thesis, we propose a novel graph neural network, named graph recurrent network (GRN). We study our GRN model on 4 very different tasks, such as machine reading comprehension, relation extraction and machine translation. Some take undirected graphs without edge labels, while the others have directed ones with edge labels. To consider these important differences, we gradually enhance our GRN model, such as further considering edge labels and adding an RNN decoder. Carefully designed experiments show the effectiveness of GRN on all these tasks. |
Tasks | Knowledge Graphs, Machine Reading Comprehension, Machine Translation, Question Answering, Reading Comprehension, Relation Extraction |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06142v1 |
https://arxiv.org/pdf/1907.06142v1.pdf | |
PWC | https://paperswithcode.com/paper/tackling-graphical-nlp-problems-with-graph |
Repo | |
Framework | |
M$^2$VAE - Derivation of a Multi-Modal Variational Autoencoder Objective from the Marginal Joint Log-Likelihood
Title | M$^2$VAE - Derivation of a Multi-Modal Variational Autoencoder Objective from the Marginal Joint Log-Likelihood |
Authors | Timo Korthals |
Abstract | This work gives an in-depth derivation of the trainable evidence lower bound obtained from the marginal joint log-Likelihood with the goal of training a Multi-Modal Variational Autoencoder (M$^2$VAE). |
Tasks | |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07303v1 |
http://arxiv.org/pdf/1903.07303v1.pdf | |
PWC | https://paperswithcode.com/paper/m2vae-derivation-of-a-multi-modal-variational |
Repo | |
Framework | |
Improving Cross-Domain Performance for Relation Extraction via Dependency Prediction and Information Flow Control
Title | Improving Cross-Domain Performance for Relation Extraction via Dependency Prediction and Information Flow Control |
Authors | Amir Pouran Ben Veyseh, Thien Huu Nguyen, Dejing Dou |
Abstract | Relation Extraction (RE) is one of the fundamental tasks in Information Extraction and Natural Language Processing. Dependency trees have been shown to be a very useful source of information for this task. The current deep learning models for relation extraction has mainly exploited this dependency information by guiding their computation along the structures of the dependency trees. One potential problem with this approach is it might prevent the models from capturing important context information beyond syntactic structures and cause the poor cross-domain generalization. This paper introduces a novel method to use dependency trees in RE for deep learning models that jointly predicts dependency and semantics relations. We also propose a new mechanism to control the information flow in the model based on the input entity mentions. Our extensive experiments on benchmark datasets show that the proposed model outperforms the existing methods for RE significantly. |
Tasks | Domain Generalization, Relation Extraction |
Published | 2019-07-07 |
URL | https://arxiv.org/abs/1907.03230v1 |
https://arxiv.org/pdf/1907.03230v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-cross-domain-performance-for |
Repo | |
Framework | |
15 Keypoints Is All You Need
Title | 15 Keypoints Is All You Need |
Authors | Michael Snower, Asim Kadav, Farley Lai, Hans Peter Graf |
Abstract | Pose tracking is an important problem that requires identifying unique human pose-instances and matching them temporally across different frames of a video. However, existing pose tracking methods are unable to accurately model temporal relationships and require significant computation, often computing the tracks offline. We present an efficient Multi-person Pose Tracking method, KeyTrack, that only relies on keypoint information without using any RGB or optical flow information to track human keypoints in real-time. Keypoints are tracked using our Pose Entailment method, in which, first, a pair of pose estimates is sampled from different frames in a video and tokenized. Then, a Transformer-based network makes a binary classification as to whether one pose temporally follows another. Furthermore, we improve our top-down pose estimation method with a novel, parameter-free, keypoint refinement technique that improves the keypoint estimates used during the Pose Entailment step. We achieve state-of-the-art results on the PoseTrack’17 and the PoseTrack’18 benchmarks while using only a fraction of the computation required by most other methods for computing the tracking information. |
Tasks | Optical Flow Estimation, Pose Estimation, Pose Tracking |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02323v2 |
https://arxiv.org/pdf/1912.02323v2.pdf | |
PWC | https://paperswithcode.com/paper/15-keypoints-is-all-you-need |
Repo | |
Framework | |
Multi-person Pose Tracking using Sequential Monte Carlo with Probabilistic Neural Pose Predictor
Title | Multi-person Pose Tracking using Sequential Monte Carlo with Probabilistic Neural Pose Predictor |
Authors | Masashi Okada, Shinji Takenaka, Tadahiro Taniguchi |
Abstract | It is an effective strategy for the multi-person pose tracking task in videos to employ prediction and pose matching in a frame-by-frame manner. For this type of approach, uncertainty-aware modeling is essential because precise prediction is impossible. However, previous studies have relied on only a single prediction without incorporating uncertainty, which can cause critical tracking errors if the prediction is unreliable. This paper proposes an extension to this approach with Sequential Monte Carlo (SMC). This naturally reformulates the tracking scheme to handle multiple predictions (or hypotheses) of poses, thereby mitigating the negative effect of prediction errors. An important component of SMC, i.e., a proposal distribution, is designed as a probabilistic neural pose predictor, which can propose diverse and plausible hypotheses by incorporating epistemic uncertainty and heteroscedastic aleatoric uncertainty. In addition, a recurrent architecture is introduced to our neural modeling to utilize time-sequence information of poses to manage difficult situations, such as the frequent disappearance and reappearances of poses. Compared to existing baselines, the proposed method achieves a state-of-the-art MOTA score on the PoseTrack2018 validation dataset by reducing approximately 50% of tracking errors from a state-of-the art baseline method. |
Tasks | Pose Tracking |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07031v2 |
https://arxiv.org/pdf/1909.07031v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-person-pose-tracking-using-sequential |
Repo | |
Framework | |
Predicting 3D Human Dynamics from Video
Title | Predicting 3D Human Dynamics from Video |
Authors | Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik |
Abstract | Given a video of a person in action, we can easily guess the 3D future motion of the person. In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input. We do this for periodic motions such as walking and also actions like bowling and squatting seen in sports or workout videos. While there has been a surge of future prediction problems in computer vision, most approaches predict 3D future from 3D past or 2D future from 2D past inputs. In this work, we focus on the problem of predicting 3D future motion from past image sequences, which has a plethora of practical applications in autonomous systems that must operate safely around people from visual inputs. Inspired by the success of autoregressive models in language modeling tasks, we learn an intermediate latent space on which we predict the future. This effectively facilitates autoregressive predictions when the input differs from the output domain. Our approach can be trained on video sequences obtained in-the-wild without 3D ground truth labels. The project website with videos can be found at https://jasonyzhang.com/phd. |
Tasks | 3D Human Dynamics, Future prediction, Human Dynamics, Language Modelling |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04781v2 |
https://arxiv.org/pdf/1908.04781v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-3d-human-dynamics-from-video |
Repo | |
Framework | |
Movement science needs different pose tracking algorithms
Title | Movement science needs different pose tracking algorithms |
Authors | Nidhi Seethapathi, Shaofei Wang, Rachit Saluja, Gunnar Blohm, Konrad P. Kording |
Abstract | Over the last decade, computer science has made progress towards extracting body pose from single camera photographs or videos. This promises to enable movement science to detect disease, quantify movement performance, and take the science out of the lab into the real world. However, current pose tracking algorithms fall short of the needs of movement science; the types of movement data that matter are poorly estimated. For instance, the metrics currently used for evaluating pose tracking algorithms use noisy hand-labeled ground truth data and do not prioritize precision of relevant variables like three-dimensional position, velocity, acceleration, and forces which are crucial for movement science. Here, we introduce the scientific disciplines that use movement data, the types of data they need, and discuss the changes needed to make pose tracking truly transformative for movement science. |
Tasks | Pose Tracking |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10226v1 |
https://arxiv.org/pdf/1907.10226v1.pdf | |
PWC | https://paperswithcode.com/paper/movement-science-needs-different-pose |
Repo | |
Framework | |
Network2Vec Learning Node Representation Based on Space Mapping in Networks
Title | Network2Vec Learning Node Representation Based on Space Mapping in Networks |
Authors | Huang Zhenhua, Wang Zhenyu, Zhang Rui, Zhao Yangyang, Xie Xiaohui, Sharad Mehrotra |
Abstract | Complex networks represented as node adjacency matrices constrains the application of machine learning and parallel algorithms. To address this limitation, network embedding (i.e., graph representation) has been intensively studied to learn a fixed-length vector for each node in an embedding space, where the node properties in the original graph are preserved. Existing methods mainly focus on learning embedding vectors to preserve nodes proximity, i.e., nodes next to each other in the graph space should also be closed in the embedding space, but do not enforce algebraic statistical properties to be shared between the embedding space and graph space. In this work, we propose a lightweight model, entitled Network2Vec, to learn network embedding on the base of semantic distance mapping between the graph space and embedding space. The model builds a bridge between the two spaces leveraging the property of group homomorphism. Experiments on different learning tasks, including node classification, link prediction, and community visualization, demonstrate the effectiveness and efficiency of the new embedding method, which improves the state-of-the-art model by 19% in node classification and 7% in link prediction tasks at most. In addition, our method is significantly faster, consuming only a fraction of the time used by some famous methods. |
Tasks | Link Prediction, Network Embedding, Node Classification |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10379v1 |
https://arxiv.org/pdf/1910.10379v1.pdf | |
PWC | https://paperswithcode.com/paper/network2vec-learning-node-representation |
Repo | |
Framework | |
Automatic Post-Stroke Lesion Segmentation on MR Images using 3D Residual Convolutional Neural Network
Title | Automatic Post-Stroke Lesion Segmentation on MR Images using 3D Residual Convolutional Neural Network |
Authors | Naofumi Tomita, Steven Jiang, Matthew E. Maeder, Saeed Hassanpour |
Abstract | In this paper, we demonstrate the feasibility and performance of deep residual neural networks for volumetric segmentation of irreversibly damaged brain tissue lesions on T1-weighted MRI scans for chronic stroke patients. A total of 239 T1-weighted MRI scans of chronic ischemic stroke patients from a public dataset were retrospectively analyzed by 3D deep convolutional segmentation models with residual learning, using a novel zoom-in&out strategy. Dice similarity coefficient (DSC), Average symmetric surface distance (ASSD), and Hausdorff distance (HD) of the identified lesions were measured by using the manual tracing of lesions as the reference standard. Bootstrapping was employed for all metrics to estimate 95% confidence intervals. The models were assessed on the test set of 31 scans. The average DSC was 0.64 (0.51-0.76) with a median of 0.78. ASSD and HD were 3.6 mm (1.7-6.2 mm) and 20.4 mm (10.0-33.3 mm), respectively. To the best of our knowledge, this performance is the highest achieved on this public dataset. The latest deep learning architecture and techniques were applied for 3D segmentation on MRI scans and demonstrated to be effective for volumetric segmentation of chronic ischemic stroke lesions. |
Tasks | Lesion Segmentation |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11209v1 |
https://arxiv.org/pdf/1911.11209v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-post-stroke-lesion-segmentation-on |
Repo | |
Framework | |
Exploiting user-frequency information for mining regionalisms from Social Media texts
Title | Exploiting user-frequency information for mining regionalisms from Social Media texts |
Authors | Juan Manuel Pérez, Damián E. Aleman, Santiago N. Kalinowski, Agustín Gravano |
Abstract | The task of detecting regionalisms (expressions or words used in certain regions) has traditionally relied on the use of questionnaires and surveys, and has also heavily depended on the expertise and intuition of the surveyor. The irruption of Social Media and its microblogging services has produced an unprecedented wealth of content, mainly informal text generated by users, opening new opportunities for linguists to extend their studies of language variation. Previous work on automatic detection of regionalisms depended mostly on word frequencies. In this work, we present a novel metric based on Information Theory that incorporates user frequency. We tested this metric on a corpus of Argentinian Spanish tweets in two ways: via manual annotation of the relevance of the retrieved terms, and also as a feature selection method for geolocation of users. In either case, our metric outperformed other techniques based solely in word frequency, suggesting that measuring the amount of users that produce a word is informative. This tool has helped lexicographers discover several unregistered words of Argentinian Spanish, as well as different meanings assigned to registered words. |
Tasks | Feature Selection |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04492v1 |
https://arxiv.org/pdf/1907.04492v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-user-frequency-information-for |
Repo | |
Framework | |
On the Constrained Least-cost Tour Problem
Title | On the Constrained Least-cost Tour Problem |
Authors | Patrick O’Hara, M. S. Ramanujan, Theodoros Damoulas |
Abstract | We introduce the Constrained Least-cost Tour (CLT) problem: given an undirected graph with weight and cost functions on the edges, minimise the total cost of a tour rooted at a start vertex such that the total weight lies within a given range. CLT is related to the family of Travelling Salesman Problems with Profits, but differs by defining the weight function on edges instead of vertices, and by requiring the total weight to be within a range instead of being at least some quota. We prove CLT is $\mathcal{NP}$-hard, even in the simple case when the input graph is a path. We derive an informative lower bound by relaxing the integrality of edges and propose a heuristic motivated by this relaxation. For the case that requires the tour to be a simple cycle, we develop two heuristics which exploit Suurballe’s algorithm to find low-cost, weight-feasible cycles. We demonstrate our algorithms by addressing a real-world problem that affects urban populations: finding routes that minimise air pollution exposure for walking, running and cycling in the city of London. |
Tasks | |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.07754v1 |
https://arxiv.org/pdf/1906.07754v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-constrained-least-cost-tour-problem |
Repo | |
Framework | |
Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection
Title | Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection |
Authors | Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Athanasia Zlatintsi, Petros Maragos |
Abstract | Detecting visual relationships, i.e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply supervised two-branch architecture, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal attentional mechanism that exploits spatio-linguistic similarities in a low-dimensional space. We present a variety of experiments comparing against all related approaches in the literature, as well as by re-implementing and fine-tuning several of them. Results on the commonly employed VRD dataset [1] show that the proposed method clearly outperforms all others, while we also justify our claims both quantitatively and qualitatively. |
Tasks | Scene Understanding |
Published | 2019-02-15 |
URL | http://arxiv.org/abs/1902.05829v1 |
http://arxiv.org/pdf/1902.05829v1.pdf | |
PWC | https://paperswithcode.com/paper/deeply-supervised-multimodal-attentional |
Repo | |
Framework | |
Efficient Cyber Attacks Detection in Industrial Control Systems Using Lightweight Neural Networks and PCA
Title | Efficient Cyber Attacks Detection in Industrial Control Systems Using Lightweight Neural Networks and PCA |
Authors | Moshe Kravchik, Asaf Shabtai |
Abstract | Industrial control systems (ICSs) are widely used and vital to industry and society. Their failure can have severe impact on both economics and human life. Hence, these systems have become an attractive target for attacks, both physical and cyber. A number of attack detection methods have been proposed, however they are characterized by a low detection rate, a substantial false positive rate, or are system specific. In this paper, we study an attack detection method based on simple and lightweight neural networks, namely, 1D convolutions and autoencoders. We apply these networks to both the time and frequency domains of the collected data and discuss pros and cons of each approach. We evaluate the suggested method on three popular public datasets and achieve detection rates matching or exceeding previously published detection results, while featuring small footprint, short training and detection times, and generality. We also demonstrate the effectiveness of PCA, which, given proper data preprocessing and feature selection, can provide high attack detection scores in many settings. Finally, we study the proposed method’s robustness against adversarial attacks, that exploit inherent blind spots of neural networks to evade detection while achieving their intended physical effect. Our results show that the proposed method is robust to such evasion attacks: in order to evade detection, the attacker is forced to sacrifice the desired physical impact on the system. This finding suggests that neural networks trained under the constraints of the laws of physics can be trusted more than networks trained under more flexible conditions. |
Tasks | Feature Selection |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01216v2 |
https://arxiv.org/pdf/1907.01216v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-cyber-attacks-detection-in |
Repo | |
Framework | |