Paper Group ANR 566
Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining. Prediction of Construction Cost for Field Canals Improvement Projects in Egypt. Knowledge Tracing with Sequential Key-Value Memory Networks. Deep Knowledge Tracing with Side Information. Distributed Training with Heterogeneous Data: Bridging Median- …
Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining
Title | Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining |
Authors | Yasutoshi Ida, Yasuhiro Fujiwara |
Abstract | Residual Networks with convolutional layers are widely used in the field of machine learning. Since they effectively extract features from input data by stacking multiple layers, they can achieve high accuracy in many applications. However, the stacking of many layers raises their computation costs. To address this problem, we propose Network Implosion, it erases multiple layers from Residual Networks without degrading accuracy. Our key idea is to introduce a priority term that identifies the importance of a layer; we can select unimportant layers according to the priority and erase them after the training. In addition, we retrain the networks to avoid critical drops in accuracy after layer erasure. A theoretical assessment reveals that our erasure and retraining scheme can erase layers without accuracy drop, and achieve higher accuracy than is possible with training from scratch. Our experiments show that Network Implosion can, for classification on Cifar-10/100 and ImageNet, reduce the number of layers by 24.00 to 42.86 percent without any drop in accuracy. |
Tasks | Model Compression |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03826v1 |
https://arxiv.org/pdf/1906.03826v1.pdf | |
PWC | https://paperswithcode.com/paper/network-implosion-effective-model-compression |
Repo | |
Framework | |
Prediction of Construction Cost for Field Canals Improvement Projects in Egypt
Title | Prediction of Construction Cost for Field Canals Improvement Projects in Egypt |
Authors | Haytham H. Elmousalami |
Abstract | Field canals improvement projects (FCIPs) are one of the ambitious projects constructed to save fresh water. To finance this project, Conceptual cost models are important to accurately predict preliminary costs at the early stages of the project. The first step is to develop a conceptual cost model to identify key cost drivers affecting the project. Therefore, input variables selection remains an important part of model development, as the poor variables selection can decrease model precision. The study discovered the most important drivers of FCIPs based on a qualitative approach and a quantitative approach. Subsequently, the study has developed a parametric cost model based on machine learning methods such as regression methods, artificial neural networks, fuzzy model and case-based reasoning. |
Tasks | |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.11804v1 |
https://arxiv.org/pdf/1905.11804v1.pdf | |
PWC | https://paperswithcode.com/paper/190511804 |
Repo | |
Framework | |
Knowledge Tracing with Sequential Key-Value Memory Networks
Title | Knowledge Tracing with Sequential Key-Value Memory Networks |
Authors | Ghodai Abdelrahman, Qing Wang |
Abstract | Can machines trace human knowledge like humans? Knowledge tracing (KT) is a fundamental task in a wide range of applications in education, such as massive open online courses (MOOCs), intelligent tutoring systems, educational games, and learning management systems. It models dynamics in a student’s knowledge states in relation to different learning concepts through their interactions with learning activities. Recently, several attempts have been made to use deep learning models for tackling the KT problem. Although these deep learning models have shown promising results, they have limitations: either lack the ability to go deeper to trace how specific concepts in a knowledge state are mastered by a student, or fail to capture long-term dependencies in an exercise sequence. In this paper, we address these limitations by proposing a novel deep learning model for knowledge tracing, namely Sequential Key-Value Memory Networks (SKVMN). This model unifies the strengths of recurrent modelling capacity and memory capacity of the existing deep learning KT models for modelling student learning. We have extensively evaluated our proposed model on five benchmark datasets. The experimental results show that (1) SKVMN outperforms the state-of-the-art KT models on all datasets, (2) SKVMN can better discover the correlation between latent concepts and questions, and (3) SKVMN can trace the knowledge state of students dynamics, and a leverage sequential dependencies in an exercise sequence for improved predication accuracy. |
Tasks | Knowledge Tracing |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13197v1 |
https://arxiv.org/pdf/1910.13197v1.pdf | |
PWC | https://paperswithcode.com/paper/191013197 |
Repo | |
Framework | |
Deep Knowledge Tracing with Side Information
Title | Deep Knowledge Tracing with Side Information |
Authors | Zhiwei Wang, Xiaoqin Feng, Jiliang Tang, Gale Yan Huang, Zitao Liu |
Abstract | Monitoring student knowledge states or skill acquisition levels known as knowledge tracing, is a fundamental part of intelligent tutoring systems. Despite its inherent challenges, recent deep neural networks based knowledge tracing models have achieved great success, which is largely from models’ ability to learn sequential dependencies of questions in student exercise data. However, in addition to sequential information, questions inherently exhibit side relations, which can enrich our understandings about student knowledge states and has great potentials to advance knowledge tracing. Thus, in this paper, we exploit side relations to improve knowledge tracing and design a novel framework DTKS. The experimental results on real education data validate the effectiveness of the proposed framework and demonstrate the importance of side information in knowledge tracing. |
Tasks | Knowledge Tracing |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00372v1 |
https://arxiv.org/pdf/1909.00372v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-knowledge-tracing-with-side-information |
Repo | |
Framework | |
Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Title | Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms |
Authors | Xiangyi Chen, Tiancong Chen, Haoran Sun, Zhiwei Steven Wu, Mingyi Hong |
Abstract | Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01736v2 |
https://arxiv.org/pdf/1906.01736v2.pdf | |
PWC | https://paperswithcode.com/paper/distributed-training-with-heterogeneous-data |
Repo | |
Framework | |
Learning Sparse Neural Networks via $\ell_0$ and T$\ell_1$ by a Relaxed Variable Splitting Method with Application to Multi-scale Curve Classification
Title | Learning Sparse Neural Networks via $\ell_0$ and T$\ell_1$ by a Relaxed Variable Splitting Method with Application to Multi-scale Curve Classification |
Authors | Fanghui Xue, Jack Xin |
Abstract | We study sparsification of convolutional neural networks (CNN) by a relaxed variable splitting method of $\ell_0$ and transformed-$\ell_1$ (T$\ell_1$) penalties, with application to complex curves such as texts written in different fonts, and words written with trembling hands simulating those of Parkinson’s disease patients. The CNN contains 3 convolutional layers, each followed by a maximum pooling, and finally a fully connected layer which contains the largest number of network weights. With $\ell_0$ penalty, we achieved over 99 % test accuracy in distinguishing shaky vs. regular fonts or hand writings with above 86 % of the weights in the fully connected layer being zero. Comparable sparsity and test accuracy are also reached with a proper choice of T$\ell_1$ penalty. |
Tasks | |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07419v1 |
http://arxiv.org/pdf/1902.07419v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-sparse-neural-networks-via-ell_0-and |
Repo | |
Framework | |
Misleading Authorship Attribution of Source Code using Adversarial Learning
Title | Misleading Authorship Attribution of Source Code using Adversarial Learning |
Authors | Erwin Quiring, Alwin Maier, Konrad Rieck |
Abstract | In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead learning-based attribution but appear plausible to a developer. The attack is guided by Monte-Carlo tree search that enables us to operate in the discrete domain of source code. In an empirical evaluation with source code from 204 programmers, we demonstrate that our attack has a substantial effect on two recent attribution methods, whose accuracy drops from over 88% to 1% under attack. Furthermore, we show that our attack can imitate the coding style of developers with high accuracy and thereby induce false attributions. We conclude that current approaches for authorship attribution are inappropriate for practical application and there is a need for resilient analysis techniques. |
Tasks | |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12386v2 |
https://arxiv.org/pdf/1905.12386v2.pdf | |
PWC | https://paperswithcode.com/paper/misleading-authorship-attribution-of-source |
Repo | |
Framework | |
Maximum Causal Entropy Specification Inference from Demonstrations
Title | Maximum Causal Entropy Specification Inference from Demonstrations |
Authors | Marcell Vazquez-Chanlatte, Sanjit A. Seshia |
Abstract | In many settings (e.g., robotics) demonstrations provide a natural way to specify tasks; however, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the tasks, such as rewards or policies, can be safely composed and/or do not explicitly capture history dependencies. Motivated by this deficit, recent works have proposed learning Boolean task specifications, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum causal entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process. This enables transforming a naive exponential time algorithm into a polynomial time algorithm. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11792v4 |
https://arxiv.org/pdf/1907.11792v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-task-specifications-from-2 |
Repo | |
Framework | |
Tracing Player Knowledge in a Parallel Programming Educational Game
Title | Tracing Player Knowledge in a Parallel Programming Educational Game |
Authors | Pavan Kantharaju, Katelyn Alderfer, Jichen Zhu, Bruce Char, Brian Smith, Santiago Ontañón |
Abstract | This paper focuses on “tracing player knowledge” in educational games. Specifically, given a set of concepts or skills required to master a game, the goal is to estimate the likelihood with which the current player has mastery of each of those concepts or skills. The main contribution of the paper is an approach that integrates machine learning and domain knowledge rules to find when the player applied a certain skill and either succeeded or failed. This is then given as input to a standard knowledge tracing module (such as those from Intelligent Tutoring Systems) to perform knowledge tracing. We evaluate our approach in the context of an educational game called “Parallel” to teach parallel and concurrent programming with data collected from real users, showing our approach can predict students skills with a low mean-squared error. |
Tasks | Knowledge Tracing |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05632v1 |
https://arxiv.org/pdf/1908.05632v1.pdf | |
PWC | https://paperswithcode.com/paper/tracing-player-knowledge-in-a-parallel |
Repo | |
Framework | |
Trial of an AI: Empowering people to explore law and science challenges
Title | Trial of an AI: Empowering people to explore law and science challenges |
Authors | Gaudron Arthur |
Abstract | Artificial Intelligence represents many things: a new market to conquer or a quality label for tech companies, a threat for traditional industries, a menace for democracy, or a blessing for our busy everyday life. The press abounds in examples illustrating these aspects, but one should draw not hasty and premature conclusions. The first successes in AI have been a surprise for society at large-including researchers in the field. Today, after the initial stupefaction, we have examples of the system reactions: traditional companies are heavily investing in AI, social platforms are monitored during elections, data collection is more and more regulated, etc. The resilience of an organization (i.e. its capacity to resist to a shock) relies deeply on the perception of its environment. Future problems have to be anticipated, while unforeseen events occurring have to be quickly identified in order to be mitigated as fast as possible. The author states that this clear perception starts with a common definition of AI in terms of capacities and limits. AI practitioners should make notions and concepts accessible to the general public and the impacted fields (e.g. industries, law, education). It is a truism that only law experts would have the potential to estimate IA impacts on judicial system. However, questions remain on how to connect different kind of expertise and what is the appropriate level of detail required for the knowledge exchanges. And the same consideration is true for dissemination towards society. Ultimately, society will live with decisions made by the “experts”. It sounds wise to involve society in the decision process rather than risking to pay consequences later. Therefore, society also needs the key concepts to understand AI impact on their life. This was the purpose of the trial of an IA that took place in October 2018 at the Court of Appeal of Paris: gathering experts from various fields to expose challenges in law and science towards a general public. |
Tasks | |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.09518v1 |
http://arxiv.org/pdf/1903.09518v1.pdf | |
PWC | https://paperswithcode.com/paper/trial-of-an-ai-empowering-people-to-explore |
Repo | |
Framework | |
Efficient Dynamic WFST Decoding for Personalized Language Models
Title | Efficient Dynamic WFST Decoding for Personalized Language Models |
Authors | Jun Liu, Jiedan Zhu, Vishal Kathuria, Fuchun Peng |
Abstract | We propose a two-layer cache mechanism to speed up dynamic WFST decoding with personalized language models. The first layer is a public cache that stores most of the static part of the graph. This is shared globally among all users. A second layer is a private cache that caches the graph that represents the personalized language model, which is only shared by the utterances from a particular user. We also propose two simple yet effective pre-initialization methods, one based on breadth-first search, and another based on a data-driven exploration of decoder states using previous utterances. Experiments with a calling speech recognition task using a personalized contact list demonstrate that the proposed public cache reduces decoding time by factor of three compared to decoding without pre-initialization. Using the private cache provides additional efficiency gains, reducing the decoding time by a factor of five. |
Tasks | Language Modelling, Speech Recognition |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10670v1 |
https://arxiv.org/pdf/1910.10670v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-dynamic-wfst-decoding-for |
Repo | |
Framework | |
High-dimensional Nonlinear Profile Monitoring based on Deep Probabilistic Autoencoders
Title | High-dimensional Nonlinear Profile Monitoring based on Deep Probabilistic Autoencoders |
Authors | Nurettin Sergin, Hao Yan |
Abstract | Wide accessibility of imaging and profile sensors in modern industrial systems created an abundance of high-dimensional sensing variables. This led to a a growing interest in the research of high-dimensional process monitoring. However, most of the approaches in the literature assume the in-control population to lie on a linear manifold with a given basis (i.e., spline, wavelet, kernel, etc) or an unknown basis (i.e., principal component analysis and its variants), which cannot be used to efficiently model profiles with a nonlinear manifold which is common in many real-life cases. We propose deep probabilistic autoencoders as a viable unsupervised learning approach to model such manifolds. To do so, we formulate nonlinear and probabilistic extensions of the monitoring statistics from classical approaches as the expected reconstruction error (ERE) and the KL-divergence (KLD) based monitoring statistics. Through extensive simulation study, we provide insights on why latent-space based statistics are unreliable and why residual-space based ones typically perform much better for deep learning based approaches. Finally, we demonstrate the superiority of deep probabilistic models via both simulation study and a real-life case study involving images of defects from a hot steel rolling process. |
Tasks | |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00482v1 |
https://arxiv.org/pdf/1911.00482v1.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-nonlinear-profile-monitoring |
Repo | |
Framework | |
Towards Federated Graph Learning for Collaborative Financial Crimes Detection
Title | Towards Federated Graph Learning for Collaborative Financial Crimes Detection |
Authors | Toyotaro Suzumura, Yi Zhou, Natahalie Baracaldo, Guangnan Ye, Keith Houck, Ryo Kawahara, Ali Anwar, Lucia Larise Stavarache, Yuji Watanabe, Pablo Loyola, Daniel Klyashtorny, Heiko Ludwig, Kumar Bhaskaran |
Abstract | Financial crime is a large and growing problem, in some way touching almost every financial institution. Financial institutions are the front line in the war against financial crime and accordingly, must devote substantial human and technology resources to this effort. Current processes to detect financial misconduct have limitations in their ability to effectively differentiate between malicious behavior and ordinary financial activity. These limitations tend to result in gross over-reporting of suspicious activity that necessitate time-intensive and costly manual review. Advances in technology used in this domain, including machine learning based approaches, can improve upon the effectiveness of financial institutions’ existing processes, however, a key challenge that most financial institutions continue to face is that they address financial crimes in isolation without any insight from other firms. Where financial institutions address financial crimes through the lens of their own firm, perpetrators may devise sophisticated strategies that may span across institutions and geographies. Financial institutions continue to work relentlessly to advance their capabilities, forming partnerships across institutions to share insights, patterns and capabilities. These public-private partnerships are subject to stringent regulatory and data privacy requirements, thereby making it difficult to rely on traditional technology solutions. In this paper, we propose a methodology to share key information across institutions by using a federated graph learning platform that enables us to build more accurate machine learning models by leveraging federated learning and also graph learning approaches. We demonstrated that our federated model outperforms local model by 20% with the UK FCA TechSprint data set. This new platform opens up a door to efficiently detecting global money laundering activity. |
Tasks | |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.12946v2 |
https://arxiv.org/pdf/1909.12946v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-federated-graph-learning-for |
Repo | |
Framework | |
Ellipsoidal Trust Region Methods for Neural Network Training
Title | Ellipsoidal Trust Region Methods for Neural Network Training |
Authors | Leonard Adolphs, Jonas Kohler, Aurelien Lucchi |
Abstract | We investigate the use of ellipsoidal trust region constraints for second-order optimization of neural networks. This approach can be seen as a higher-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we show that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for convergence of (first- and) second-order trust region methods and report that this ellipsoidal constraint constantly outperforms its spherical counterpart in practice. |
Tasks | |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.09201v3 |
https://arxiv.org/pdf/1905.09201v3.pdf | |
PWC | https://paperswithcode.com/paper/ellipsoidal-trust-region-methods-and-the |
Repo | |
Framework | |
Instance Cross Entropy for Deep Metric Learning
Title | Instance Cross Entropy for Deep Metric Learning |
Authors | Xinshao Wang, Elyor Kodirov, Yang Hua, Neil Robertson |
Abstract | Loss functions play a crucial role in deep metric learning thus a variety of them have been proposed. Some supervise the learning process by pairwise or tripletwise similarity constraints while others take advantage of structured similarity information among multiple data points. In this work, we approach deep metric learning from a novel perspective. We propose instance cross entropy (ICE) which measures the difference between an estimated instance-level matching distribution and its ground-truth one. ICE has three main appealing properties. Firstly, similar to categorical cross entropy (CCE), ICE has clear probabilistic interpretation and exploits structured semantic similarity information for learning supervision. Secondly, ICE is scalable to infinite training data as it learns on mini-batches iteratively and is independent of the training set size. Thirdly, motivated by our relative weight analysis, seamless sample reweighting is incorporated. It rescales samples’ gradients to control the differentiation degree over training examples instead of truncating them by sample mining. In addition to its simplicity and intuitiveness, extensive experiments on three real-world benchmarks demonstrate the superiority of ICE. |
Tasks | Metric Learning, Semantic Similarity, Semantic Textual Similarity |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09976v1 |
https://arxiv.org/pdf/1911.09976v1.pdf | |
PWC | https://paperswithcode.com/paper/instance-cross-entropy-for-deep-metric-1 |
Repo | |
Framework | |