Paper Group ANR 520
Orthogonal and Idempotent Transformations for Learning Deep Neural Networks. Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning. Anti-Makeup: Learning A Bi-Level Adversarial Network for Makeup-Invariant Face Verification. Mostly Exploration-Free Algorithms for Contextual Bandits. A Probabilistic Linear Genetic Programming …
Orthogonal and Idempotent Transformations for Learning Deep Neural Networks
Title | Orthogonal and Idempotent Transformations for Learning Deep Neural Networks |
Authors | Jingdong Wang, Yajie Xing, Kexin Zhang, Cha Zhang |
Abstract | Identity transformations, used as skip-connections in residual networks, directly connect convolutional layers close to the input and those close to the output in deep neural networks, improving information flow and thus easing the training. In this paper, we introduce two alternative linear transforms, orthogonal transformation and idempotent transformation. According to the definition and property of orthogonal and idempotent matrices, the product of multiple orthogonal (same idempotent) matrices, used to form linear transformations, is equal to a single orthogonal (idempotent) matrix, resulting in that information flow is improved and the training is eased. One interesting point is that the success essentially stems from feature reuse and gradient reuse in forward and backward propagation for maintaining the information during flow and eliminating the gradient vanishing problem because of the express way through skip-connections. We empirically demonstrate the effectiveness of the proposed two transformations: similar performance in single-branch networks and even superior in multi-branch networks in comparison to identity transformations. |
Tasks | |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.05974v1 |
http://arxiv.org/pdf/1707.05974v1.pdf | |
PWC | https://paperswithcode.com/paper/orthogonal-and-idempotent-transformations-for |
Repo | |
Framework | |
Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning
Title | Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning |
Authors | Zhen He, Shaobing Gao, Liang Xiao, Daxue Liu, Hangen He, David Barber |
Abstract | Long Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden states are represented by tensors and updated via a cross-layer convolution. By increasing the tensor size, the network can be widened efficiently without additional parameters since the parameters are shared across different locations in the tensor; by delaying the output, the network can be deepened implicitly with little additional runtime since deep computations for each timestep are merged into temporal computations of the sequence. Experiments conducted on five challenging sequence learning tasks show the potential of the proposed model. |
Tasks | |
Published | 2017-11-05 |
URL | http://arxiv.org/abs/1711.01577v3 |
http://arxiv.org/pdf/1711.01577v3.pdf | |
PWC | https://paperswithcode.com/paper/wider-and-deeper-cheaper-and-faster |
Repo | |
Framework | |
Anti-Makeup: Learning A Bi-Level Adversarial Network for Makeup-Invariant Face Verification
Title | Anti-Makeup: Learning A Bi-Level Adversarial Network for Makeup-Invariant Face Verification |
Authors | Yi Li, Lingxiao Song, Xiang Wu, Ran He, Tieniu Tan |
Abstract | Makeup is widely used to improve facial attractiveness and is well accepted by the public. However, different makeup styles will result in significant facial appearance changes. It remains a challenging problem to match makeup and non-makeup face images. This paper proposes a learning from generation approach for makeup-invariant face verification by introducing a bi-level adversarial network (BLAN). To alleviate the negative effects from makeup, we first generate non-makeup images from makeup ones, and then use the synthesized non-makeup images for further verification. Two adversarial networks in BLAN are integrated in an end-to-end deep network, with the one on pixel level for reconstructing appealing facial images and the other on feature level for preserving identity information. These two networks jointly reduce the sensing gap between makeup and non-makeup images. Moreover, we make the generator well constrained by incorporating multiple perceptual losses. Experimental results on three benchmark makeup face datasets demonstrate that our method achieves state-of-the-art verification accuracy across makeup status and can produce photo-realistic non-makeup face images. |
Tasks | Face Verification |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03654v2 |
http://arxiv.org/pdf/1709.03654v2.pdf | |
PWC | https://paperswithcode.com/paper/anti-makeup-learning-a-bi-level-adversarial |
Repo | |
Framework | |
Mostly Exploration-Free Algorithms for Contextual Bandits
Title | Mostly Exploration-Free Algorithms for Contextual Bandits |
Authors | Hamsa Bastani, Mohsen Bayati, Khashayar Khosravi |
Abstract | The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate optimal (achieves asymptotically optimal regret) if there is sufficient randomness in the observed contexts (covariates). We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound (UCB). |
Tasks | Multi-Armed Bandits |
Published | 2017-04-28 |
URL | https://arxiv.org/abs/1704.09011v7 |
https://arxiv.org/pdf/1704.09011v7.pdf | |
PWC | https://paperswithcode.com/paper/mostly-exploration-free-algorithms-for |
Repo | |
Framework | |
A Probabilistic Linear Genetic Programming with Stochastic Context-Free Grammar for solving Symbolic Regression problems
Title | A Probabilistic Linear Genetic Programming with Stochastic Context-Free Grammar for solving Symbolic Regression problems |
Authors | Léo Françoso Dal Piccol Sotto, Vinícius Veloso de Melo |
Abstract | Traditional Linear Genetic Programming (LGP) algorithms are based only on the selection mechanism to guide the search. Genetic operators combine or mutate random portions of the individuals, without knowing if the result will lead to a fitter individual. Probabilistic Model Building Genetic Programming (PMB-GP) methods were proposed to overcome this issue through a probability model that captures the structure of the fit individuals and use it to sample new individuals. This work proposes the use of LGP with a Stochastic Context-Free Grammar (SCFG), that has a probability distribution that is updated according to selected individuals. We proposed a method for adapting the grammar into the linear representation of LGP. Tests performed with the proposed probabilistic method, and with two hybrid approaches, on several symbolic regression benchmark problems show that the results are statistically better than the obtained by the traditional LGP. |
Tasks | |
Published | 2017-04-03 |
URL | http://arxiv.org/abs/1704.00828v1 |
http://arxiv.org/pdf/1704.00828v1.pdf | |
PWC | https://paperswithcode.com/paper/a-probabilistic-linear-genetic-programming |
Repo | |
Framework | |
Globally Optimal Symbolic Regression
Title | Globally Optimal Symbolic Regression |
Authors | Vernon Austel, Sanjeeb Dash, Oktay Gunluk, Lior Horesh, Leo Liberti, Giacomo Nannicini, Baruch Schieber |
Abstract | In this study we introduce a new technique for symbolic regression that guarantees global optimality. This is achieved by formulating a mixed integer non-linear program (MINLP) whose solution is a symbolic mathematical expression of minimum complexity that explains the observations. We demonstrate our approach by rediscovering Kepler’s law on planetary motion using exoplanet data and Galileo’s pendulum periodicity equation using experimental data. |
Tasks | |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.10720v3 |
http://arxiv.org/pdf/1710.10720v3.pdf | |
PWC | https://paperswithcode.com/paper/globally-optimal-symbolic-regression |
Repo | |
Framework | |
ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
Title | ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent |
Authors | Vishwak Srinivasan, Adepu Ravi Sankar, Vineeth N Balasubramanian |
Abstract | Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak’s heavy ball method and Nesterov’s accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter $m$ which is always suggested to be set to less than $1$. Although the choice of $m < 1$ is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method $\textit{ADINE}$, which relaxes the constraint of $m < 1$ and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on $m$ by experimentally verifying that a higher momentum ($\ge 1$) can help escape saddles much faster. Using this motivation, we propose our method $\textit{ADINE}$ that helps weigh the previous updates more (by setting the momentum parameter $> 1$), evaluate our proposed algorithm on deep neural networks and show that $\textit{ADINE}$ helps the learning algorithm to converge much faster without compromising on the generalization error. |
Tasks | |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07424v1 |
http://arxiv.org/pdf/1712.07424v1.pdf | |
PWC | https://paperswithcode.com/paper/adine-an-adaptive-momentum-method-for |
Repo | |
Framework | |
A location-aware embedding technique for accurate landmark recognition
Title | A location-aware embedding technique for accurate landmark recognition |
Authors | Federico Magliani, Navid Mahmoudian Bidgoli, Andrea Prati |
Abstract | The current state of the research in landmark recognition highlights the good accuracy which can be achieved by embedding techniques, such as Fisher vector and VLAD. All these techniques do not exploit spatial information, i.e. consider all the features and the corresponding descriptors without embedding their location in the image. This paper presents a new variant of the well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique which accounts, at a certain degree, for the location of features. The driving motivation comes from the observation that, usually, the most interesting part of an image (e.g., the landmark to be recognized) is almost at the center of the image, while the features at the borders are irrelevant features which do no depend on the landmark. The proposed variant, called locVLAD (location-aware VLAD), computes the mean of the two global descriptors: the VLAD executed on the entire original image, and the one computed on a cropped image which removes a certain percentage of the image borders. This simple variant shows an accuracy greater than the existing state-of-the-art approach. Experiments are conducted on two public datasets (ZuBuD and Holidays) which are used both for training and testing. Morever a more balanced version of ZuBuD is proposed. |
Tasks | |
Published | 2017-04-19 |
URL | http://arxiv.org/abs/1704.05754v1 |
http://arxiv.org/pdf/1704.05754v1.pdf | |
PWC | https://paperswithcode.com/paper/a-location-aware-embedding-technique-for |
Repo | |
Framework | |
Resolving the Complexity of Some Fundamental Problems in Computational Social Choice
Title | Resolving the Complexity of Some Fundamental Problems in Computational Social Choice |
Authors | Palash Dey |
Abstract | This thesis is in the area called computational social choice which is an intersection area of algorithms and social choice theory. |
Tasks | |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08041v1 |
http://arxiv.org/pdf/1703.08041v1.pdf | |
PWC | https://paperswithcode.com/paper/resolving-the-complexity-of-some-fundamental |
Repo | |
Framework | |
The application of deep convolutional neural networks to ultrasound for modelling of dynamic states within human skeletal muscle
Title | The application of deep convolutional neural networks to ultrasound for modelling of dynamic states within human skeletal muscle |
Authors | Ryan J. Cunningham, Peter J. Harding, Ian D. Loram |
Abstract | This paper concerns the fully automatic direct in vivo measurement of active and passive dynamic skeletal muscle states using ultrasound imaging. Despite the long standing medical need (myopathies, neuropathies, pain, injury, ageing), currently technology (electromyography, dynamometry, shear wave imaging) provides no general, non-invasive method for online estimation of skeletal intramuscular states. Ultrasound provides a technology in which static and dynamic muscle states can be observed non-invasively, yet current computational image understanding approaches are inadequate. We propose a new approach in which deep learning methods are used for understanding the content of ultrasound images of muscle in terms of its measured state. Ultrasound data synchronized with electromyography of the calf muscles, with measures of joint torque/angle were recorded from 19 healthy participants (6 female, ages: 30 +- 7.7). A segmentation algorithm previously developed by our group was applied to extract a region of interest of the medial gastrocnemius. Then a deep convolutional neural network was trained to predict the measured states (joint angle/torque, electromyography) directly from the segmented images. Results revealed for the first time that active and passive muscle states can be measured directly from standard b-mode ultrasound images, accurately predicting for a held out test participant changes in the joint angle, electromyography, and torque with as little error as 0.022{\deg}, 0.0001V, 0.256Nm (root mean square error) respectively. |
Tasks | |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09450v1 |
http://arxiv.org/pdf/1706.09450v1.pdf | |
PWC | https://paperswithcode.com/paper/the-application-of-deep-convolutional-neural |
Repo | |
Framework | |
Joint Adaptive Neighbours and Metric Learning for Multi-view Subspace Clustering
Title | Joint Adaptive Neighbours and Metric Learning for Multi-view Subspace Clustering |
Authors | Nan Xu, Yanqing Guo, Jiujun Wang, Xiangyang Luo, Ran He |
Abstract | Due to the existence of various views or representations in many real-world data, multi-view learning has drawn much attention recently. Multi-view spectral clustering methods based on similarity matrixes or graphs are pretty popular. Generally, these algorithms learn informative graphs by directly utilizing original data. However, in the real-world applications, original data often contain noises and outliers that lead to unreliable graphs. In addition, different views may have different contributions to data clustering. In this paper, a novel Multiview Subspace Clustering method unifying Adaptive neighbours and Metric learning (MSCAM), is proposed to address the above problems. In this method, we use the subspace representations of different views to adaptively learn a consensus similarity matrix, uncovering the subspace structure and avoiding noisy nature of original data. For all views, we also learn different Mahalanobis matrixes that parameterize the squared distances and consider the contributions of different views. Further, we constrain the graph constructed by the similarity matrix to have exact c (c is the number of clusters) connected components. An iterative algorithm is developed to solve this optimization problem. Moreover, experiments on a synthetic dataset and different real-world datasets demonstrate the effectiveness of MSCAM. |
Tasks | Metric Learning, MULTI-VIEW LEARNING, Multi-view Subspace Clustering |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03656v1 |
http://arxiv.org/pdf/1709.03656v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-adaptive-neighbours-and-metric-learning |
Repo | |
Framework | |
Toward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations
Title | Toward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations |
Authors | Liang Lu |
Abstract | Neural network acoustic models have significantly advanced state of the art speech recognition over the past few years. However, they are usually computationally expensive due to the large number of matrix-vector multiplications and nonlinearity operations. Neural network models also require significant amounts of memory for inference because of the large model size. For these two reasons, it is challenging to deploy neural network based speech recognizers on resource-constrained platforms such as embedded devices. This paper investigates the use of binary weights and activations for computation and memory efficient neural network acoustic models. Compared to real-valued weight matrices, binary weights require much fewer bits for storage, thereby cutting down the memory footprint. Furthermore, with binary weights or activations, the matrix-vector multiplications are turned into addition and subtraction operations, which are computationally much faster and more energy efficient for hardware platforms. In this paper, we study the applications of binary weights and activations for neural network acoustic modeling, reporting encouraging results on the WSJ and AMI corpora. |
Tasks | Speech Recognition |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09453v2 |
http://arxiv.org/pdf/1706.09453v2.pdf | |
PWC | https://paperswithcode.com/paper/toward-computation-and-memory-efficient |
Repo | |
Framework | |
Evaluation of Direct Haptic 4D Volume Rendering of Partially Segmented Data for Liver Puncture Simulation
Title | Evaluation of Direct Haptic 4D Volume Rendering of Partially Segmented Data for Liver Puncture Simulation |
Authors | Andre Mastmeyer, Dirk Fortmeier, Heinz Handels |
Abstract | This work presents an evaluation study using a force feedback evaluation framework for a novel direct needle force volume rendering concept in the context of liver puncture simulation. PTC/PTCD puncture interventions targeting the bile ducts have been selected to illustrate this concept. The haptic algorithms of the simulator system are based on (1) partially segmented patient image data and (2) a non-linear spring model effective at organ borders. The primary aim is to quantitatively evaluate force errors caused by our patient modeling approach, in comparison to haptic force output obtained from using gold-standard, completely manually-segmented data. The evaluation of the force algorithms compared to a force output from fully manually segmented gold-standard patient models, yields a low mean of 0.12 N root mean squared force error and up to 1.6 N for systematic maximum absolute errors. Force errors were evaluated on 31,222 preplanned test paths from 10 patients. Only twelve percent of the emitted forces along these paths were affected by errors. This is the first study evaluating haptic algorithms with deformable virtual patients in silico. We prove haptic rendering plausibility on a very high number of test paths. Important errors are below just noticeable differences for the hand-arm system. |
Tasks | |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07118v1 |
http://arxiv.org/pdf/1705.07118v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluation-of-direct-haptic-4d-volume |
Repo | |
Framework | |
Guided Proofreading of Automatic Segmentations for Connectomics
Title | Guided Proofreading of Automatic Segmentations for Connectomics |
Authors | Daniel Haehn, Verena Kaynig, James Tompkin, Jeff W. Lichtman, Hanspeter Pfister |
Abstract | Automatic cell image segmentation methods in connectomics produce merge and split errors, which require correction through proofreading. Previous research has identified the visual search for these errors as the bottleneck in interactive proofreading. To aid error correction, we develop two classifiers that automatically recommend candidate merges and splits to the user. These classifiers use a convolutional neural network (CNN) that has been trained with errors in automatic segmentations against expert-labeled ground truth. Our classifiers detect potentially-erroneous regions by considering a large context region around a segmentation boundary. Corrections can then be performed by a user with yes/no decisions, which reduces variation of information 7.5x faster than previous proofreading methods. We also present a fully-automatic mode that uses a probability threshold to make merge/split decisions. Extensive experiments using the automatic approach and comparing performance of novice and expert users demonstrate that our method performs favorably against state-of-the-art proofreading methods on different connectomics datasets. |
Tasks | Semantic Segmentation |
Published | 2017-04-04 |
URL | http://arxiv.org/abs/1704.00848v1 |
http://arxiv.org/pdf/1704.00848v1.pdf | |
PWC | https://paperswithcode.com/paper/guided-proofreading-of-automatic |
Repo | |
Framework | |
Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle
Title | Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle |
Authors | Marco Scutari |
Abstract | A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work (Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior. |
Tasks | Bayesian Inference, Model Selection |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00689v5 |
http://arxiv.org/pdf/1708.00689v5.pdf | |
PWC | https://paperswithcode.com/paper/dirichlet-bayesian-network-scores-and-the |
Repo | |
Framework | |