January 29, 2020

3233 words 16 mins read

Paper Group ANR 566

Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining. Prediction of Construction Cost for Field Canals Improvement Projects in Egypt. Knowledge Tracing with Sequential Key-Value Memory Networks. Deep Knowledge Tracing with Side Information. Distributed Training with Heterogeneous Data: Bridging Median- …

Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining


Title	Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining
Authors	Yasutoshi Ida, Yasuhiro Fujiwara
Abstract	Residual Networks with convolutional layers are widely used in the field of machine learning. Since they effectively extract features from input data by stacking multiple layers, they can achieve high accuracy in many applications. However, the stacking of many layers raises their computation costs. To address this problem, we propose Network Implosion, it erases multiple layers from Residual Networks without degrading accuracy. Our key idea is to introduce a priority term that identifies the importance of a layer; we can select unimportant layers according to the priority and erase them after the training. In addition, we retrain the networks to avoid critical drops in accuracy after layer erasure. A theoretical assessment reveals that our erasure and retraining scheme can erase layers without accuracy drop, and achieve higher accuracy than is possible with training from scratch. Our experiments show that Network Implosion can, for classification on Cifar-10/100 and ImageNet, reduce the number of layers by 24.00 to 42.86 percent without any drop in accuracy.
Tasks	Model Compression
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03826v1
PDF	https://arxiv.org/pdf/1906.03826v1.pdf
PWC	https://paperswithcode.com/paper/network-implosion-effective-model-compression
Repo
Framework

Prediction of Construction Cost for Field Canals Improvement Projects in Egypt


Title	Prediction of Construction Cost for Field Canals Improvement Projects in Egypt
Authors	Haytham H. Elmousalami
Abstract	Field canals improvement projects (FCIPs) are one of the ambitious projects constructed to save fresh water. To finance this project, Conceptual cost models are important to accurately predict preliminary costs at the early stages of the project. The first step is to develop a conceptual cost model to identify key cost drivers affecting the project. Therefore, input variables selection remains an important part of model development, as the poor variables selection can decrease model precision. The study discovered the most important drivers of FCIPs based on a qualitative approach and a quantitative approach. Subsequently, the study has developed a parametric cost model based on machine learning methods such as regression methods, artificial neural networks, fuzzy model and case-based reasoning.
Tasks
Published	2019-05-20
URL	https://arxiv.org/abs/1905.11804v1
PDF	https://arxiv.org/pdf/1905.11804v1.pdf
PWC	https://paperswithcode.com/paper/190511804
Repo
Framework

Knowledge Tracing with Sequential Key-Value Memory Networks


Title	Knowledge Tracing with Sequential Key-Value Memory Networks
Authors	Ghodai Abdelrahman, Qing Wang
Abstract	Can machines trace human knowledge like humans? Knowledge tracing (KT) is a fundamental task in a wide range of applications in education, such as massive open online courses (MOOCs), intelligent tutoring systems, educational games, and learning management systems. It models dynamics in a student’s knowledge states in relation to different learning concepts through their interactions with learning activities. Recently, several attempts have been made to use deep learning models for tackling the KT problem. Although these deep learning models have shown promising results, they have limitations: either lack the ability to go deeper to trace how specific concepts in a knowledge state are mastered by a student, or fail to capture long-term dependencies in an exercise sequence. In this paper, we address these limitations by proposing a novel deep learning model for knowledge tracing, namely Sequential Key-Value Memory Networks (SKVMN). This model unifies the strengths of recurrent modelling capacity and memory capacity of the existing deep learning KT models for modelling student learning. We have extensively evaluated our proposed model on five benchmark datasets. The experimental results show that (1) SKVMN outperforms the state-of-the-art KT models on all datasets, (2) SKVMN can better discover the correlation between latent concepts and questions, and (3) SKVMN can trace the knowledge state of students dynamics, and a leverage sequential dependencies in an exercise sequence for improved predication accuracy.
Tasks	Knowledge Tracing
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13197v1
PDF	https://arxiv.org/pdf/1910.13197v1.pdf
PWC	https://paperswithcode.com/paper/191013197
Repo
Framework

Deep Knowledge Tracing with Side Information


Title	Deep Knowledge Tracing with Side Information
Authors	Zhiwei Wang, Xiaoqin Feng, Jiliang Tang, Gale Yan Huang, Zitao Liu
Abstract	Monitoring student knowledge states or skill acquisition levels known as knowledge tracing, is a fundamental part of intelligent tutoring systems. Despite its inherent challenges, recent deep neural networks based knowledge tracing models have achieved great success, which is largely from models’ ability to learn sequential dependencies of questions in student exercise data. However, in addition to sequential information, questions inherently exhibit side relations, which can enrich our understandings about student knowledge states and has great potentials to advance knowledge tracing. Thus, in this paper, we exploit side relations to improve knowledge tracing and design a novel framework DTKS. The experimental results on real education data validate the effectiveness of the proposed framework and demonstrate the importance of side information in knowledge tracing.
Tasks	Knowledge Tracing
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00372v1
PDF	https://arxiv.org/pdf/1909.00372v1.pdf
PWC	https://paperswithcode.com/paper/deep-knowledge-tracing-with-side-information
Repo
Framework

Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms


Title	Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Authors	Xiangyi Chen, Tiancong Chen, Haoran Sun, Zhiwei Steven Wu, Mingyi Hong
Abstract	Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01736v2
PDF	https://arxiv.org/pdf/1906.01736v2.pdf
PWC	https://paperswithcode.com/paper/distributed-training-with-heterogeneous-data
Repo
Framework

Learning Sparse Neural Networks via $\ell_0$ and T$\ell_1$ by a Relaxed Variable Splitting Method with Application to Multi-scale Curve Classification


Title	Learning Sparse Neural Networks via $\ell_0$ and T$\ell_1$ by a Relaxed Variable Splitting Method with Application to Multi-scale Curve Classification
Authors	Fanghui Xue, Jack Xin
Abstract	We study sparsification of convolutional neural networks (CNN) by a relaxed variable splitting method of $\ell_0$ and transformed-$\ell_1$ (T$\ell_1$) penalties, with application to complex curves such as texts written in different fonts, and words written with trembling hands simulating those of Parkinson’s disease patients. The CNN contains 3 convolutional layers, each followed by a maximum pooling, and finally a fully connected layer which contains the largest number of network weights. With $\ell_0$ penalty, we achieved over 99 % test accuracy in distinguishing shaky vs. regular fonts or hand writings with above 86 % of the weights in the fully connected layer being zero. Comparable sparsity and test accuracy are also reached with a proper choice of T$\ell_1$ penalty.
Tasks
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07419v1
PDF	http://arxiv.org/pdf/1902.07419v1.pdf
PWC	https://paperswithcode.com/paper/learning-sparse-neural-networks-via-ell_0-and
Repo
Framework

Misleading Authorship Attribution of Source Code using Adversarial Learning


Title	Misleading Authorship Attribution of Source Code using Adversarial Learning
Authors	Erwin Quiring, Alwin Maier, Konrad Rieck
Abstract	In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead learning-based attribution but appear plausible to a developer. The attack is guided by Monte-Carlo tree search that enables us to operate in the discrete domain of source code. In an empirical evaluation with source code from 204 programmers, we demonstrate that our attack has a substantial effect on two recent attribution methods, whose accuracy drops from over 88% to 1% under attack. Furthermore, we show that our attack can imitate the coding style of developers with high accuracy and thereby induce false attributions. We conclude that current approaches for authorship attribution are inappropriate for practical application and there is a need for resilient analysis techniques.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12386v2
PDF	https://arxiv.org/pdf/1905.12386v2.pdf
PWC	https://paperswithcode.com/paper/misleading-authorship-attribution-of-source
Repo
Framework

Maximum Causal Entropy Specification Inference from Demonstrations


Title	Maximum Causal Entropy Specification Inference from Demonstrations
Authors	Marcell Vazquez-Chanlatte, Sanjit A. Seshia
Abstract	In many settings (e.g., robotics) demonstrations provide a natural way to specify tasks; however, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the tasks, such as rewards or policies, can be safely composed and/or do not explicitly capture history dependencies. Motivated by this deficit, recent works have proposed learning Boolean task specifications, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum causal entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process. This enables transforming a naive exponential time algorithm into a polynomial time algorithm.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11792v4
PDF	https://arxiv.org/pdf/1907.11792v4.pdf
PWC	https://paperswithcode.com/paper/learning-task-specifications-from-2
Repo
Framework

Tracing Player Knowledge in a Parallel Programming Educational Game


Title	Tracing Player Knowledge in a Parallel Programming Educational Game
Authors	Pavan Kantharaju, Katelyn Alderfer, Jichen Zhu, Bruce Char, Brian Smith, Santiago Ontañón
Abstract	This paper focuses on “tracing player knowledge” in educational games. Specifically, given a set of concepts or skills required to master a game, the goal is to estimate the likelihood with which the current player has mastery of each of those concepts or skills. The main contribution of the paper is an approach that integrates machine learning and domain knowledge rules to find when the player applied a certain skill and either succeeded or failed. This is then given as input to a standard knowledge tracing module (such as those from Intelligent Tutoring Systems) to perform knowledge tracing. We evaluate our approach in the context of an educational game called “Parallel” to teach parallel and concurrent programming with data collected from real users, showing our approach can predict students skills with a low mean-squared error.
Tasks	Knowledge Tracing
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05632v1
PDF	https://arxiv.org/pdf/1908.05632v1.pdf
PWC	https://paperswithcode.com/paper/tracing-player-knowledge-in-a-parallel
Repo
Framework

Trial of an AI: Empowering people to explore law and science challenges


Title	Trial of an AI: Empowering people to explore law and science challenges
Authors	Gaudron Arthur
Abstract	Artificial Intelligence represents many things: a new market to conquer or a quality label for tech companies, a threat for traditional industries, a menace for democracy, or a blessing for our busy everyday life. The press abounds in examples illustrating these aspects, but one should draw not hasty and premature conclusions. The first successes in AI have been a surprise for society at large-including researchers in the field. Today, after the initial stupefaction, we have examples of the system reactions: traditional companies are heavily investing in AI, social platforms are monitored during elections, data collection is more and more regulated, etc. The resilience of an organization (i.e. its capacity to resist to a shock) relies deeply on the perception of its environment. Future problems have to be anticipated, while unforeseen events occurring have to be quickly identified in order to be mitigated as fast as possible. The author states that this clear perception starts with a common definition of AI in terms of capacities and limits. AI practitioners should make notions and concepts accessible to the general public and the impacted fields (e.g. industries, law, education). It is a truism that only law experts would have the potential to estimate IA impacts on judicial system. However, questions remain on how to connect different kind of expertise and what is the appropriate level of detail required for the knowledge exchanges. And the same consideration is true for dissemination towards society. Ultimately, society will live with decisions made by the “experts”. It sounds wise to involve society in the decision process rather than risking to pay consequences later. Therefore, society also needs the key concepts to understand AI impact on their life. This was the purpose of the trial of an IA that took place in October 2018 at the Court of Appeal of Paris: gathering experts from various fields to expose challenges in law and science towards a general public.
Tasks
Published	2019-03-05
URL	http://arxiv.org/abs/1903.09518v1
PDF	http://arxiv.org/pdf/1903.09518v1.pdf
PWC	https://paperswithcode.com/paper/trial-of-an-ai-empowering-people-to-explore
Repo
Framework

Efficient Dynamic WFST Decoding for Personalized Language Models


Title	Efficient Dynamic WFST Decoding for Personalized Language Models
Authors	Jun Liu, Jiedan Zhu, Vishal Kathuria, Fuchun Peng
Abstract	We propose a two-layer cache mechanism to speed up dynamic WFST decoding with personalized language models. The first layer is a public cache that stores most of the static part of the graph. This is shared globally among all users. A second layer is a private cache that caches the graph that represents the personalized language model, which is only shared by the utterances from a particular user. We also propose two simple yet effective pre-initialization methods, one based on breadth-first search, and another based on a data-driven exploration of decoder states using previous utterances. Experiments with a calling speech recognition task using a personalized contact list demonstrate that the proposed public cache reduces decoding time by factor of three compared to decoding without pre-initialization. Using the private cache provides additional efficiency gains, reducing the decoding time by a factor of five.
Tasks	Language Modelling, Speech Recognition
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10670v1
PDF	https://arxiv.org/pdf/1910.10670v1.pdf
PWC	https://paperswithcode.com/paper/efficient-dynamic-wfst-decoding-for
Repo
Framework

High-dimensional Nonlinear Profile Monitoring based on Deep Probabilistic Autoencoders


Title	High-dimensional Nonlinear Profile Monitoring based on Deep Probabilistic Autoencoders
Authors	Nurettin Sergin, Hao Yan
Abstract	Wide accessibility of imaging and profile sensors in modern industrial systems created an abundance of high-dimensional sensing variables. This led to a a growing interest in the research of high-dimensional process monitoring. However, most of the approaches in the literature assume the in-control population to lie on a linear manifold with a given basis (i.e., spline, wavelet, kernel, etc) or an unknown basis (i.e., principal component analysis and its variants), which cannot be used to efficiently model profiles with a nonlinear manifold which is common in many real-life cases. We propose deep probabilistic autoencoders as a viable unsupervised learning approach to model such manifolds. To do so, we formulate nonlinear and probabilistic extensions of the monitoring statistics from classical approaches as the expected reconstruction error (ERE) and the KL-divergence (KLD) based monitoring statistics. Through extensive simulation study, we provide insights on why latent-space based statistics are unreliable and why residual-space based ones typically perform much better for deep learning based approaches. Finally, we demonstrate the superiority of deep probabilistic models via both simulation study and a real-life case study involving images of defects from a hot steel rolling process.
Tasks
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00482v1
PDF	https://arxiv.org/pdf/1911.00482v1.pdf
PWC	https://paperswithcode.com/paper/high-dimensional-nonlinear-profile-monitoring
Repo
Framework

Towards Federated Graph Learning for Collaborative Financial Crimes Detection


Title	Towards Federated Graph Learning for Collaborative Financial Crimes Detection
Authors	Toyotaro Suzumura, Yi Zhou, Natahalie Baracaldo, Guangnan Ye, Keith Houck, Ryo Kawahara, Ali Anwar, Lucia Larise Stavarache, Yuji Watanabe, Pablo Loyola, Daniel Klyashtorny, Heiko Ludwig, Kumar Bhaskaran
Abstract	Financial crime is a large and growing problem, in some way touching almost every financial institution. Financial institutions are the front line in the war against financial crime and accordingly, must devote substantial human and technology resources to this effort. Current processes to detect financial misconduct have limitations in their ability to effectively differentiate between malicious behavior and ordinary financial activity. These limitations tend to result in gross over-reporting of suspicious activity that necessitate time-intensive and costly manual review. Advances in technology used in this domain, including machine learning based approaches, can improve upon the effectiveness of financial institutions’ existing processes, however, a key challenge that most financial institutions continue to face is that they address financial crimes in isolation without any insight from other firms. Where financial institutions address financial crimes through the lens of their own firm, perpetrators may devise sophisticated strategies that may span across institutions and geographies. Financial institutions continue to work relentlessly to advance their capabilities, forming partnerships across institutions to share insights, patterns and capabilities. These public-private partnerships are subject to stringent regulatory and data privacy requirements, thereby making it difficult to rely on traditional technology solutions. In this paper, we propose a methodology to share key information across institutions by using a federated graph learning platform that enables us to build more accurate machine learning models by leveraging federated learning and also graph learning approaches. We demonstrated that our federated model outperforms local model by 20% with the UK FCA TechSprint data set. This new platform opens up a door to efficiently detecting global money laundering activity.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.12946v2
PDF	https://arxiv.org/pdf/1909.12946v2.pdf
PWC	https://paperswithcode.com/paper/towards-federated-graph-learning-for
Repo
Framework

Ellipsoidal Trust Region Methods for Neural Network Training


Title	Ellipsoidal Trust Region Methods for Neural Network Training
Authors	Leonard Adolphs, Jonas Kohler, Aurelien Lucchi
Abstract	We investigate the use of ellipsoidal trust region constraints for second-order optimization of neural networks. This approach can be seen as a higher-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we show that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for convergence of (first- and) second-order trust region methods and report that this ellipsoidal constraint constantly outperforms its spherical counterpart in practice.
Tasks
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09201v3
PDF	https://arxiv.org/pdf/1905.09201v3.pdf
PWC	https://paperswithcode.com/paper/ellipsoidal-trust-region-methods-and-the
Repo
Framework

Instance Cross Entropy for Deep Metric Learning


Title	Instance Cross Entropy for Deep Metric Learning
Authors	Xinshao Wang, Elyor Kodirov, Yang Hua, Neil Robertson
Abstract	Loss functions play a crucial role in deep metric learning thus a variety of them have been proposed. Some supervise the learning process by pairwise or tripletwise similarity constraints while others take advantage of structured similarity information among multiple data points. In this work, we approach deep metric learning from a novel perspective. We propose instance cross entropy (ICE) which measures the difference between an estimated instance-level matching distribution and its ground-truth one. ICE has three main appealing properties. Firstly, similar to categorical cross entropy (CCE), ICE has clear probabilistic interpretation and exploits structured semantic similarity information for learning supervision. Secondly, ICE is scalable to infinite training data as it learns on mini-batches iteratively and is independent of the training set size. Thirdly, motivated by our relative weight analysis, seamless sample reweighting is incorporated. It rescales samples’ gradients to control the differentiation degree over training examples instead of truncating them by sample mining. In addition to its simplicity and intuitiveness, extensive experiments on three real-world benchmarks demonstrate the superiority of ICE.
Tasks	Metric Learning, Semantic Similarity, Semantic Textual Similarity
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09976v1
PDF	https://arxiv.org/pdf/1911.09976v1.pdf
PWC	https://paperswithcode.com/paper/instance-cross-entropy-for-deep-metric-1
Repo
Framework