Paper Group ANR 867
DeepSquare: Boosting the Learning Power of Deep Convolutional Neural Networks with Elementwise Square Operators. Recurrent Neural Networks for Time Series Forecasting. Implicit Filter Sparsification In Convolutional Neural Networks. Domain Agnostic Learning with Disentangled Representations. Joint Embedding Learning of Educational Knowledge Graphs. …
DeepSquare: Boosting the Learning Power of Deep Convolutional Neural Networks with Elementwise Square Operators
Title | DeepSquare: Boosting the Learning Power of Deep Convolutional Neural Networks with Elementwise Square Operators |
Authors | Sheng Chen, Xu Wang, Chao Chen, Yifan Lu, Xijin Zhang, Linfu Wen |
Abstract | Modern neural network modules which can significantly enhance the learning power usually add too much computational complexity to the original neural networks. In this paper, we pursue very efficient neural network modules which can significantly boost the learning power of deep convolutional neural networks with negligible extra computational cost. We first present both theoretically and experimentally that elementwise square operator has a potential to enhance the learning power of neural networks. Then, we design four types of lightweight modules with elementwise square operators, named as Square-Pooling, Square-Softmin, Square-Excitation, and Square-Encoding. We add our four lightweight modules to Resnet18, Resnet50, and ShuffleNetV2 for better performance in the experiment on ImageNet 2012 dataset. The experimental results show that our modules can bring significant accuracy improvements to the base convolutional neural network models. The performance of our lightweight modules is even comparable to many complicated modules such as bilinear pooling, Squeeze-and-Excitation, and Gather-Excite. Our highly efficient modules are particularly suitable for mobile models. For example, when equipped with a single Square-Pooling module, the top-1 classification accuracy of ShuffleNetV2-0.5x on ImageNet 2012 is absolutely improved by 1.45% with no additional parameters and negligible inference time overhead. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04979v1 |
https://arxiv.org/pdf/1906.04979v1.pdf | |
PWC | https://paperswithcode.com/paper/deepsquare-boosting-the-learning-power-of |
Repo | |
Framework | |
Recurrent Neural Networks for Time Series Forecasting
Title | Recurrent Neural Networks for Time Series Forecasting |
Authors | Gábor Petneházi |
Abstract | Time series forecasting is difficult. It is difficult even for recurrent neural networks with their inherent ability to learn sequentiality. This article presents a recurrent neural network based time series forecasting framework covering feature engineering, feature importances, point and interval predictions, and forecast evaluation. The description of the method is followed by an empirical study using both LSTM and GRU networks. |
Tasks | Feature Engineering, Time Series, Time Series Forecasting |
Published | 2019-01-01 |
URL | http://arxiv.org/abs/1901.00069v1 |
http://arxiv.org/pdf/1901.00069v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-for-time-series |
Repo | |
Framework | |
Implicit Filter Sparsification In Convolutional Neural Networks
Title | Implicit Filter Sparsification In Convolutional Neural Networks |
Authors | Dushyant Mehta, Kwang In Kim, Christian Theobalt |
Abstract | We show implicit filter level sparsity manifests in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained with adaptive gradient descent techniques and L2 regularization or weight decay. Through an extensive empirical study (Mehta et al., 2019) we hypothesize the mechanism behind the sparsification process, and find surprising links to certain filter sparsification heuristics proposed in literature. Emergence of, and the subsequent pruning of selective features is observed to be one of the contributing mechanisms, leading to feature sparsity at par or better than certain explicit sparsification / pruning approaches. In this workshop article we summarize our findings, and point out corollaries of selective-featurepenalization which could also be employed as heuristics for filter pruning |
Tasks | L2 Regularization |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04967v1 |
https://arxiv.org/pdf/1905.04967v1.pdf | |
PWC | https://paperswithcode.com/paper/implicit-filter-sparsification-in |
Repo | |
Framework | |
Domain Agnostic Learning with Disentangled Representations
Title | Domain Agnostic Learning with Disentangled Representations |
Authors | Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko |
Abstract | Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains. Yet the current literature assumes that the separation of target data into distinct domains is known as a priori. In this paper, we propose the task of Domain-Agnostic Learning (DAL): How to transfer knowledge from a labeled source domain to unlabeled data from arbitrary target domains? To tackle this problem, we devise a novel Deep Adversarial Disentangled Autoencoder (DADA) capable of disentangling domain-specific features from class identity. We demonstrate experimentally that when the target domain labels are unknown, DADA leads to state-of-the-art performance on several image classification datasets. |
Tasks | Image Classification |
Published | 2019-04-28 |
URL | http://arxiv.org/abs/1904.12347v1 |
http://arxiv.org/pdf/1904.12347v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-agnostic-learning-with-disentangled |
Repo | |
Framework | |
Joint Embedding Learning of Educational Knowledge Graphs
Title | Joint Embedding Learning of Educational Knowledge Graphs |
Authors | Siyu Yao, Ruijie Wang, Shen Sun, Derui Bu, Jun Liu |
Abstract | As an efficient model for knowledge organization, the knowledge graph has been widely adopted in several fields, e.g., biomedicine, sociology, and education. And there is a steady trend of learning embedding representations of knowledge graphs to facilitate knowledge graph construction and downstream tasks. In general, knowledge graph embedding techniques aim to learn vectorized representations which preserve the structural information of the graph. And conventional embedding learning models rely on structural relationships among entities and relations. However, in educational knowledge graphs, structural relationships are not the focus. Instead, rich literals of the graphs are more valuable. In this paper, we focus on this problem and propose a novel model for embedding learning of educational knowledge graphs. Our model considers both structural and literal information and jointly learns embedding representations. Three experimental graphs were constructed based on an educational knowledge graph which has been applied in real-world teaching. We conducted two experiments on the three graphs and other common benchmark graphs. The experimental results proved the effectiveness of our model and its superiority over other baselines when processing educational knowledge graphs. |
Tasks | graph construction, Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08776v2 |
https://arxiv.org/pdf/1911.08776v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-embedding-learning-of-educational |
Repo | |
Framework | |
Improving Slot Filling by Utilizing Contextual Information
Title | Improving Slot Filling by Utilizing Contextual Information |
Authors | Amir Pouran Ben Veyseh, Franck Dernonrcourt, Thien Huu Nguyen |
Abstract | Slot Filling is the task of extracting the semantic concept from a given natural language utterance. Recently it has been shown that using contextual information, either in work representations (e.g., BERT embedding) or in the computation graph of the model, could improve the performance of the model. However, recent work uses the contextual information in a restricted manner, e.g., by concatenating the word representation and its context feature vector, limiting the model from learning any direct association between the context and the label of word. We introduce a new deep model utilizing the contextual information for each work in the given sentence in a multi-task setting. Our model enforce consistency between the feature vectors of the context and the word while increasing the expressiveness of the context about the label of the word. Our empirical analysis on a slot filling dataset proves the superiority of the model over the baselines. |
Tasks | Slot Filling |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01680v1 |
https://arxiv.org/pdf/1911.01680v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-slot-filling-by-utilizing |
Repo | |
Framework | |
A Temporal Sequence Learning for Action Recognition and Prediction
Title | A Temporal Sequence Learning for Action Recognition and Prediction |
Authors | Sangwoo Cho, Hassan Foroosh |
Abstract | In this work\footnote {This work was supported in part by the National Science Foundation under grant IIS-1212948.}, we present a method to represent a video with a sequence of words, and learn the temporal sequencing of such words as the key information for predicting and recognizing human actions. We leverage core concepts from the Natural Language Processing (NLP) literature used in sentence classification to solve the problems of action prediction and action recognition. Each frame is converted into a word that is represented as a vector using the Bag of Visual Words (BoW) encoding method. The words are then combined into a sentence to represent the video, as a sentence. The sequence of words in different actions are learned with a simple but effective Temporal Convolutional Neural Network (T-CNN) that captures the temporal sequencing of information in a video sentence. We demonstrate that a key characteristic of the proposed method is its low-latency, i.e. its ability to predict an action accurately with a partial sequence (sentence). Experiments on two datasets, \textit{UCF101} and \textit{HMDB51} show that the method on average reaches 95% of its accuracy within half the video frames. Results, also demonstrate that our method achieves compatible state-of-the-art performance in action recognition (i.e. at the completion of the sentence) in addition to action prediction. |
Tasks | Sentence Classification |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06813v1 |
https://arxiv.org/pdf/1906.06813v1.pdf | |
PWC | https://paperswithcode.com/paper/a-temporal-sequence-learning-for-action |
Repo | |
Framework | |
Optimal Collusion-Free Teaching
Title | Optimal Collusion-Free Teaching |
Authors | David Kirkpatrick, Hans U. Simon, Sandra Zilles |
Abstract | Formal models of learning from teachers need to respect certain criteria to avoid collusion. The most commonly accepted notion of collusion-freeness was proposed by Goldman and Mathias (1996), and various teaching models obeying their criterion have been studied. For each model $M$ and each concept class $\mathcal{C}$, a parameter $M$-$\mathrm{TD}(\mathcal{C})$ refers to the teaching dimension of concept class $\mathcal{C}$ in model $M$—defined to be the number of examples required for teaching a concept, in the worst case over all concepts in $\mathcal{C}$. This paper introduces a new model of teaching, called no-clash teaching, together with the corresponding parameter $\mathrm{NCTD}(\mathcal{C})$. No-clash teaching is provably optimal in the strong sense that, given any concept class $\mathcal{C}$ and any model $M$ obeying Goldman and Mathias’s collusion-freeness criterion, one obtains $\mathrm{NCTD}(\mathcal{C})\le M$-$\mathrm{TD}(\mathcal{C})$. We also study a corresponding notion $\mathrm{NCTD}^+$ for the case of learning from positive data only, establish useful bounds on $\mathrm{NCTD}$ and $\mathrm{NCTD}^+$, and discuss relations of these parameters to the VC-dimension and to sample compression. In addition to formulating an optimal model of collusion-free teaching, our main results are on the computational complexity of deciding whether $\mathrm{NCTD}^+(\mathcal{C})=k$ (or $\mathrm{NCTD}(\mathcal{C})=k$) for given $\mathcal{C}$ and $k$. We show some such decision problems to be equivalent to the existence question for certain constrained matchings in bipartite graphs. Our NP-hardness results for the latter are of independent interest in the study of constrained graph matchings. |
Tasks | |
Published | 2019-03-10 |
URL | http://arxiv.org/abs/1903.04012v1 |
http://arxiv.org/pdf/1903.04012v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-collusion-free-teaching |
Repo | |
Framework | |
Measuring the compositionality of noun-noun compounds over time
Title | Measuring the compositionality of noun-noun compounds over time |
Authors | Prajit Dhar, Janis Pagel, Lonneke van der Plas |
Abstract | We present work in progress on the temporal progression of compositionality in noun-noun compounds. Previous work has proposed computational methods for determining the compositionality of compounds. These methods try to automatically determine how transparent the meaning of the compound as a whole is with respect to the meaning of its parts. We hypothesize that such a property might change over time. We use the time-stamped Google Books corpus for our diachronic investigations, and first examine whether the vector-based semantic spaces extracted from this corpus are able to predict compositionality ratings, despite their inherent limitations. We find that using temporal information helps predicting the ratings, although correlation with the ratings is lower than reported for other corpora. Finally, we show changes in compositionality over time for a selection of compounds. |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02563v2 |
https://arxiv.org/pdf/1906.02563v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-compositionality-of-noun-noun |
Repo | |
Framework | |
I-SAFE: Instant Suspicious Activity identiFication at the Edge using Fuzzy Decision Making
Title | I-SAFE: Instant Suspicious Activity identiFication at the Edge using Fuzzy Decision Making |
Authors | Seyed Yahya Nikouei, Yu Chen, Alexander Aved, Erik Blasch, Timothy R. Faughnan |
Abstract | Urban imagery usually serves as forensic analysis and by design is available for incident mitigation. As more imagery collected, it is harder to narrow down to certain frames among thousands of video clips to a specific incident. A real-time, proactive surveillance system is desirable, which could instantly detect dubious personnel, identify suspicious activities, or raise momentous alerts. The recent proliferation of the edge computing paradigm allows more data-intensive tasks to be accomplished by smart edge devices with lightweight but powerful algorithms. This paper presents a forensic surveillance strategy by introducing an Instant Suspicious Activity identiFication at the Edge (I-SAFE) using fuzzy decision making. A fuzzy control system is proposed to mimic the decision-making process of a security officer. Decisions are made based on video features extracted by a lightweight Deep Machine Learning (DML) model. Based on the requirements from the first-line law enforcement officers, several features are selected and fuzzified to cope with the state of uncertainty that exists in the officers’ decision-making process. Using features in the edge hierarchy minimizes the communication delay such that instant alerting is achieved. Additionally, leveraging the Microservices architecture, the I-SAFE scheme possesses good scalability given the increasing complexities at the network edge. Implemented as an edge-based application and tested using exemplary and various labeled dataset surveillance videos, the I-SAFE scheme raises alerts by identifying the suspicious activity in an average of 0.002 seconds. Compared to four other state-of-the-art methods over two other data sets, the experimental study verified the superiority of the I-SAFE decentralized method. |
Tasks | Decision Making |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05776v1 |
https://arxiv.org/pdf/1909.05776v1.pdf | |
PWC | https://paperswithcode.com/paper/i-safe-instant-suspicious-activity |
Repo | |
Framework | |
Improved Res2Net model for Person re-identification
Title | Improved Res2Net model for Person re-identification |
Authors | Zongjing Cao, Hyo Jong Lee |
Abstract | Person re-identification has become a very popular research topic in the computer vision community owing to its numerous applications and growing importance in visual surveillance. Person re-identification remains challenging due to occlusion, illumination and significant intra-class variations across different cameras. In this paper, we propose a multi-task network base on an improved Res2Net model that simultaneously computes the identification loss and verification loss of two pedestrian images. Given a pair of pedestrian images, the system predicts the identities of the two input images and whether they belong to the same identity. In order to obtain deeper feature information of pedestrians, we propose to use the latest Res2Net model for feature extraction of each input image. Experiments on several large-scale person re-identification benchmark datasets demonstrate the accuracy of our approach. For example, rank-1 accuracies are 83.18% (+1.38) and 93.14% (+0.84) for the DukeMTMC and Market-1501 datasets, respectively. The proposed method shows encouraging improvements compared with state-of-the-art methods. |
Tasks | Large-Scale Person Re-Identification, Person Re-Identification |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.04061v2 |
https://arxiv.org/pdf/1910.04061v2.pdf | |
PWC | https://paperswithcode.com/paper/person-re-identification-based-on-res2net |
Repo | |
Framework | |
Augmenting Data with Mixup for Sentence Classification: An Empirical Study
Title | Augmenting Data with Mixup for Sentence Classification: An Empirical Study |
Authors | Hongyu Guo, Yongyi Mao, Richong Zhang |
Abstract | Mixup, a recent proposed data augmentation method through linearly interpolating inputs and modeling targets of random samples, has demonstrated its capability of significantly improving the predictive accuracy of the state-of-the-art networks for image classification. However, how this technique can be applied to and what is its effectiveness on natural language processing (NLP) tasks have not been investigated. In this paper, we propose two strategies for the adaption of Mixup on sentence classification: one performs interpolation on word embeddings and another on sentence embeddings. We conduct experiments to evaluate our methods using several benchmark datasets. Our studies show that such interpolation strategies serve as an effective, domain independent data augmentation approach for sentence classification, and can result in significant accuracy improvement for both CNN and LSTM models. |
Tasks | Data Augmentation, Image Classification, Sentence Classification, Sentence Embeddings, Word Embeddings |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.08941v1 |
https://arxiv.org/pdf/1905.08941v1.pdf | |
PWC | https://paperswithcode.com/paper/augmenting-data-with-mixup-for-sentence |
Repo | |
Framework | |
Transfer Learning for Nonparametric Classification: Minimax Rate and Adaptive Classifier
Title | Transfer Learning for Nonparametric Classification: Minimax Rate and Adaptive Classifier |
Authors | T. Tony Cai, Hongji Wei |
Abstract | Human learners have the natural ability to use knowledge gained in one setting for learning in a different but related setting. This ability to transfer knowledge from one task to another is essential for effective learning. In this paper, we study transfer learning in the context of nonparametric classification based on observations from different distributions under the posterior drift model, which is a general framework and arises in many practical problems. We first establish the minimax rate of convergence and construct a rate-optimal two-sample weighted $K$-NN classifier. The results characterize precisely the contribution of the observations from the source distribution to the classification task under the target distribution. A data-driven adaptive classifier is then proposed and is shown to simultaneously attain within a logarithmic factor of the optimal rate over a large collection of parameter spaces. Simulation studies and real data applications are carried out where the numerical results further illustrate the theoretical analysis. Extensions to the case of multiple source distributions are also considered. |
Tasks | Transfer Learning |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02903v1 |
https://arxiv.org/pdf/1906.02903v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-nonparametric |
Repo | |
Framework | |
Automated Weed Detection in Aerial Imagery with Context
Title | Automated Weed Detection in Aerial Imagery with Context |
Authors | Delia Bullock, Andrew Mangeni, Tyr Wiesner-Hanks, Chad DeChant, Ethan L. Stewart, Nicholas Kaczmar, Judith M. Kolkman, Rebecca J. Nelson, Michael A. Gore, Hod Lipson |
Abstract | In this paper, we demonstrate the ability to discriminate between cultivated maize plant and grass or grass-like weed image segments using the context surrounding the image segments. While convolutional neural networks have brought state of the art accuracies within object detection, errors arise when objects in different classes share similar features. This scenario often occurs when objects in images are viewed at too small of a scale to discern distinct differences in features, causing images to be incorrectly classified or localized. To solve this problem, we will explore using context when classifying image segments. This technique involves feeding a convolutional neural network a central square image along with a border of its direct surroundings at train and test times. This means that although images are labelled at a smaller scale to preserve accurate localization, the network classifies the images and learns features that include the wider context. We demonstrate the benefits of this context technique in the object detection task through a case study of grass (foxtail) and grass-like (yellow nutsedge) weed detection in maize fields. In this standard situation, adding context alone nearly halved the error of the neural network from 7.1% to 4.3%. After only one epoch with context, the network also achieved a higher accuracy than the network without context did after 50 epochs. The benefits of using the context technique are likely to particularly evident in agricultural contexts in which parts (such as leaves) of several plants may appear similar when not taking into account the context in which those parts appear. |
Tasks | Object Detection |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00652v3 |
https://arxiv.org/pdf/1910.00652v3.pdf | |
PWC | https://paperswithcode.com/paper/automated-crabgrass-detection-in-aerial |
Repo | |
Framework | |
An Approach for Process Model Extraction By Multi-Grained Text Classification
Title | An Approach for Process Model Extraction By Multi-Grained Text Classification |
Authors | Chen Qian, Lijie Wen, Akhil Kumar, Leilei Lin, Li Lin, Zan Zong, Shuang Li, Jianmin Wang |
Abstract | Process model extraction (PME) is a recently emerged interdiscipline between natural language processing (NLP) and business process management (BPM), which aims to extract process models from textual descriptions. Previous process extractors heavily depend on manual features and ignore the potential relations between clues of different text granularities. In this paper, we formalize the PME task into the multi-grained text classification problem, and propose a hierarchical neural network to effectively model and extract multi-grained information without manually-defined procedural features. Under this structure, we accordingly propose the coarse-to-fine (grained) learning mechanism, training multi-grained tasks in coarse-to-fine grained order to share the high-level knowledge for the low-level tasks. To evaluate our approach, we construct two multi-grained datasets from two different domains and conduct extensive experiments from different dimensions. The experimental results demonstrate that our approach outperforms the state-of-the-art methods with statistical significance and further investigations demonstrate its effectiveness. |
Tasks | Multi-Task Learning, Semantic Role Labeling, Sentence Classification, Text Classification |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1906.02127v3 |
https://arxiv.org/pdf/1906.02127v3.pdf | |
PWC | https://paperswithcode.com/paper/190602127 |
Repo | |
Framework | |