```
library(SLOPE)
fit <- SLOPE(wine$x, wine$y, family = "multinomial")
```

The only problem is that Metropolis is no longer actively maintained. The latest update (at the time of writing) was six years ago and since then a number of issues have cropped up. Most of them are not major and can be circumvented through various hacks and workarounds, but I have grown increasingly frustrated with the separate file of Metropolis patches that I’ve had to keep around to fix these issues. Beamer itself in fact even includes several patches now in order to stop the theme from breaking (see here, here, and here.)

This has now (since some months back actually) led me to fork Metropolis to try to fix these outstanding issues. I call the new theme *Moloch* (which is likely familiar to you if you know your Metropolis). The original design is still pretty much intact save for a few tweaks that I will summarize in Section 3, which overall bring the theme closer to normal beamer behavior. The code for the theme has also been simplified and made more robust. Metropolis, for instance, made much use of `\patchcmd`

from the etoolbox package to patch beamer theme internals in order to support modifications to, for instance, frame titles. This was what lead the theme to break in the first place as beamer introduced changes in these commands and I have thus opted to remove all this kind of patching in favor of relying on standard functionality from the beamer theme instead.

In fact, it is now possible to change the title format directly through beamer, for instance by calling `\setbeamertemplate{frametitle}{\MakeUppercase{\insertframetitle}}`

to make titles upper-case.^{1}

This comes at the price of sacrificing some features, such as toggling title formatting between uppercase, small caps, and regular text. But, as the Metropolis documentation itself noted,^{2} these modifications were problematic in the first place and I therefore think that their removal is on the whole a good thing.

I’ve also removed the pgfplots theme that was included in Metropolis. I don’t mind the theme per se, but I don’t think it belongs in a general-purpose beamer theme.

The design of the theme does not stray far from the original Metropolis design (and will not do so in the future either). Below is a simple example of a few slides of the theme.

Moloch is now on CTAN, so you can install it with the TeXLive package manager by calling the following line of code:

`tlmgr install moloch`

If you have TeXLive 2024 (or later), then Moloch is already included in the distribution and you don’t have to do anything to install it.

Using the theme is as simple as using any other beamer theme. Here is a simple example:

```
\documentclass{beamer}
\usetheme{moloch}
\title{Your Title}
\author{Your Name}
\institute{Your Institute}
\date{\today}
\begin{document}
\maketitle
\section{First Section}
\begin{frame}
\frametitle{First Frame}
Hello, world!
\end{frame}
\end{document}
```

See the package documentation to learn more about the theme and its various options. If you’re used to Metropolis, then you mostly need to know that `\metroset`

has been replaced by `\molochset`

and that some things are no longer supported, which is precisely what we’ll dig into in the next section!

I’ve tried to outline the main changes that I can think of in the following sections.

I always thought the green color in Metropolis was lurid and not exactly color-blind friendly. I therefore changed it to a teal color that I think is a little more subdued and easier on the eyes. You can see the difference in the figure below. I hope you agree that the new color is an improvement!

Subtitles are now supported in Moloch. They were were not in Metropolis because the author thought subtitles were a bad idea in general. On the whole I agree that subtitles are usually best avoided, but I didn’t see any reason to impose this opinion on others. Subtitles are therefore supported in Moloch.

Metropolis sported its own frame numbering system. There was nothing wrong with this system except it necessitated a separate package (appendixframenumber) to get frame/page numbers to restart (and not count towards the total number) for appendix slides. Beamer has, however, improved its own system in recent years and there is no longer any need for a custom solution (or separate package) to support this functionality. As a result, Moloch just relies on beamer templates for the frame numbering. The design is *slightly* different, with smaller font size and somewhat different margins, but I only think this is for the better anyway.

Now, you can just change it via the standard beamer commands for frame numbering, like so:

`\setbeamertemplate{page number in head/foot}[appendixframenumber]`

The title page has been redesigned. The primary changes are the following.

- The institute is now positioned below the author (rather than the date), which I think makes more sense since the author and institute are closer related (in my mind at least). This was suggested in an issue on the Metropolis repo, but never adopted.
- The titlegraphic now has margins added above and below. It was previously put in a zero-height
`vbox`

, which meant that it basically didn’t affect the page layout. Now it does and will push the titles and other content down. The new layout gives equal margins between top and bottom of the frame and the content, and adapts to the size of the title graphic. This may or may not be what you want, but in this case you can just wrap the graphic in a`vbox`

of zero height yourself, so I see this as a less invasive default. - The margins around the elements on the title page were changed everywhere. Please see the screenshots below to see what I mean, but the main change is that there is less spacing between the title and the subtitle and even spacing above and below the orange line.

For reference, the code for generating the slides is given below.

```
\documentclass[10pt]{beamer}
\usetheme{moloch}
\title{Title}
\subtitle{Subtitle}
\author{Author}
\institute{Institute}
\date{\today}
\titlegraphic{\hfill\includegraphics[height=2cm]{logo.pdf}}
\begin{document}
\maketitle
\end{document}
```

I am open for suggestions and discussions on how to further improve the title page layout, or make customizing it easier.

Metropolis includes special handling of font settings. If you use LuaTeX or XeTeX, then Metropolis automatically checks if the Fira Sans font is available and sets it up for you. I like the Fira fonts myself and think that they are a nice choice for presentations, but I do not think that they should be set as part of the theme, especially since this means that you get different output by default if you run your document through pdfTeX instead, which I think is undesireable.

I’ve therefore disabled these font settings, but if you want to replicate the look of Metropolis when it comes to the fonts as well, then all you need is you use XeTeX or LuaTeX and set your font options according to the following example (or something similar).

```
\usepackage{fontspec}
\setsansfont[
ItalicFont={Fira Sans Light Italic},
BoldFont={Fira Sans},
BoldItalicFont={Fira Sans Italic}
]{Fira Sans Light}
\setmonofont[BoldFont={Fira Mono Medium}]{Fira Mono}
\AtBeginEnvironment{tabular}{%
\addfontfeature{Numbers={Monospaced}}
}
```

If you want to have `\operatorname`

, `\mathrm`

, and company in the Fira font as well, then you’ll need to set `\setmainfont`

as well.

Note that there is only a beta version of the Fira Math Light font available and that it is not at all complete, so unfortunately there is no good way to get a matching math font for Fira Sans Light at the moment. (Otherwise we could use **unicode-math** and `Fira Math Light`

). This I think is another good argument for why Fira should not be set as the default font for the theme.

Unlike standard Beamer, in which `\parskip`

(paragraph spacing, roughly speaking) is set to zero, Metropolis instead sets it to 0.5 em units. This means that in Metropolis, you don’t need to sprinkle `\medskip`

(or whatever you use for paragraph spacing) in your slides to have them neatly separated.

As I noted in this issue and also this one, however, this has some undesireable side-effects,^{3} such as introducing additional spacing between table captions and the table, for instance. As a consequence, I’ve therefore removed this setting from Moloch.

As with many other changes, this puts Moloch more in line with the standard Beamer behavior, which I think is generally speaking a good thing and simplifies switching between themes.

Metropolis introduced a bit of custom logic to handle block environments. In particular, filled blocks environments were modified such that the main body text (for the frame) aligns with the boundaries of the box and not the text inside the box (which is the default behavior in beamer). See below for a comparison.

I think the proper choice is the default beamer behavior, especially since this otherwise means that the content inside the blocks look different if you switch to filled blocks. In addition, the spacing for the normal block environments in Metropolis does not work properly, so switching to the default behavior also solves this issue.

Moloch is part of CTAN and included in TeXLive 2024, so you typically do not need to concern yourself with installing the theme from source. But if you want to do so nonetheless, for instance to enable some new feature or fix from the development version, then Moloch now uses l3build instead of a custom Makefile to handle the build process, which should make life easier for most people, and you simply just need to call these lines:

```
git clone https://github.com/jolars/moloch.git
cd moloch
l3build install
```

In addition, it also means that the package now includes unit tests to make sure that nothing in the theme breaks unexpectedly.

There are several other small changes. I’ve tried to list some of them here below.

I currently don’t foresee any major changes to the theme and will likely upgrade it to a stable state in the near future. So you can count on the theme not to introduce any breaking changes. I think the original Metropolis design is great and I don’t want to stray too far from it.

That being said, one thing that I want to do is to make the colors in the theme easier to customize and perhaps introduce alternative color schemes. That also means bringing back the hi-contrast theme that was in Metropolis but that I removed from Moloch (for reasons that I can’t quite recall now..). In any case, I don’t intend to modify the default choices.

If you feel that you can contribute, then please do! The project is on github and you are welcome to raise an issue or start a new discussion if there’s anything you think could be improved.

samcarter helped out a lot with discussions and testing of the theme and also helped make transitioning from Metropolis to Moloch smoother. She will actually give a talk on the TUG 2024 meeting in Prague July 19-21 about the theme, so please check it out if you have the chance!

Finally: credit where credit is due. I want to stress that the vast majority of the code in Moloch was written by Matthias Vogelsang, who created the Metropolis package, and that my job has mostly been to patch up its rough spots.

Thanks samcarter for informing me of this!↩︎

And see this issue for instance.↩︎

Also see this issue on the Beamer repo for more background.↩︎

My PhD thesis is now published and available for download (Larsson 2024)! It is the culmination of five years of research on optimization other related numerical algoritms for sparse regression, in particular the lasso and sorted l-one penalized estimation (SLOPE).

In the following sections I will give an overview over the papers that are included in the thesis. This a somewhat abridged version of the paper summary section in the actual thesis and comes without a lengthy introduction to the field and these topics. If you are interested in that, I suggest you read the thesis itself!

The first of the papers introduces *the strong screening rule for SLOPE* (Larsson, Bogdan, and Wallin 2020), which is the first screening rule for SLOPE. If you haven’t heard about screening rules before, they are algorithms that discard features (predictors/variables) prior to fitting the model. They are remarkably effective for sparse methods in the high-dimensional setting and typically offer speed ups of several orders of magnitude in the high-dimensional setting. They were first discovered for the lasso with El Ghaoui, Viallon, and Rabbani (2010) and have since proven to be a key ingredient in making the lasso computationally efficient.

They are based on the following reasoning:

- The solution to a sparse regression problem is, of course,
*sparse*, particularly in the case when the number of features () outnumber the number of observations (). For the lasso, for instance, the size of the support*must*in fact be no larger than . - We can often guess quite accurately which features have little chance of being in the support, for instance by looking at the correlation between the features and the response or the solution to a problem with a larger (or smaller) penalty.
^{1} - Even if we are wrong about which features are in the support, it is typically cheap to check if we made a mistake and refit with these features added back in.

This reasoning turns out to be pretty-much on spot and as a result screening rules have turned out to be critical for good performance for the lasso and related methods.

Screening rules are typically separated into *safe* and *heuristic* rules. Safe rules guarantee that discarded features are in fact not in the optimal solution, whereas heuristic rules do not. This division is something is something of a misnomer, however, since it is easy to check optimality conditions after fitting the model on the reduced set of features, catch any mistakes, and refit is necessary. And because safe rules sacrifice effectiveness for safety together with the fact that the optimality checks are not very expensive, it is my experience that heuristic rules typically offer better performance. They can even be used together.

The first heuristic screening rule for the lasso was introduced by Tibshirani et al. (2012): *the strong screening rule*. And in the first paper of my thesis, we extend this screening rule strategy to the problem of solving sorted (_1) penalized regression (SLOPE) (Bogdan et al. 2015).

I have provided some results from the first paper in Table 1. As you can see, screening improves performance considerably and offers no computational overhead even when it has little effect (as in the case of the physician data set).

Dataset | Model | Time (No screening) | Time (Screening) | ||
---|---|---|---|---|---|

dorothea | Logistic | 800 | 88119 | 914 | 14 |

e2006-tfidf | Least squares | 3308 | 150358 | 43353 | 4944 |

news20 | Multinomial | 1000 | 62061 | 5485 | 517 |

physician | Poisson | 4406 | 25 | 34 | 34 |

Screening rules are particularly effective when they are sequential, that is, operate along the regularization path.^{2} But another possibility that had previously not been explored is the idea of screening not only for the next step on the path, but for *all* of the remaining steps as well. This is the idea behind *look-ahead screening rules*, which I introduce in the second paper of the thesis, which is a short paper (Larsson 2021). We use the Gap-Safe screening rule (Ndiaye et al. 2017) here. As the name suggests, it is a safe screening rule. This means that if a feature is screened out, it is guaranteed to be zero in the solution.

As I show in the paper, the results are quite promising (Figure 2), especially since you get this kind of screening essentially for free (if you’re screening anyway).

Even though the strong rule for the lasso is highly effective in general, there is one area in which it struggles, namely, when features are highly correlated. Tibshirani et al. (2012) in fact noted this themselves and motivated using a modified technique: the working-set strategy (where the model is initially fit using the ever-active set, rather than the strong set) because of this.

The reason for this is that the strong rule (and every other screening rules we know of), ignores information about how close the features are to becoming active. This is the motivation for the *Hessian screening rule* that we introduce in the third paper of the thesis (Larsson and Wallin 2022). The name stems from the fact that we use second-order information about the optimization problem, which involves the Hessian matrix . The rule offers a better estimate of the correlation vector, which in practice leads to better screening performance.

An ongoing problem in literature on optimization (including screening rules) is that there are now so many methods to examine and so many different models and datasets on which to compare them on, that it has become difficult to keep track of which methods it is that actually do best on a given problem. You can easily find a paper A that studies optimization methods X and Y on datasets I and II and conclude that X is better than Y but then find another paper B, which studies methods X, Y, and Z on datasets I and III and conclude that, actually, Y is better than X and, by the way, Z happens to be best of them all. Then, later, you find paper C, which claims that Z actually is considerably worse than X, which in fact also performs better for data set IV. This confused state of affairs is typically the result of authors having benchmarked their methods using different hardware, programming languages for their implementations, hyperparameters for their methods, and convergence criteria, to name a few of the many possible sources of variation.

In short, there is a dire need for a framework through which this process can be made simple, reproducible, and transparent. This is the motivation behind the **benchopt** package, which we present in the fourth of this thesis’ papers (Moreau et al. 2022).

The goal of benchopt is to make life easier for both researchers in optimization and users of optimization software. For a researcher who has developed a new optimization method for SLOPE, for instance, all you need to do is to write the code for your solver (optimization method) and plug it into the existing benchopt benchmark for SLOPE and run it. The package will then automatically compare your method with all the other methods in the benchmark and output table and plots of the results Figure 3. If you instead are a user who is interested in using SLOPE for your applied work and want to know which algorithm to use, you can either browse the extensive database of results that other users have already uploaded or just download the benchmark and run it yourself on the data that you are interested in using it for.

Proximal coordinate descent is a very efficient optimization algorithm for fitting the lasso, but it cannot handle the case when the penalty term is non-separable, which is the case in SLOPE. In practice, this has reduced the applicability of SLOPE to large data, which is unfortunate given the many appealing properties of the model.

In paper 5 (Larsson et al. 2023), however, we present a way to circumvent this issue by using a hybrid of proximal coordinate and proximal gradient descent. Our main discovery is that if we fix the clusters and optimize over each cluster in turn, rather than each feature, the problem becomes separable, which means that coordinate descent can be used. And if we combine this with proximal gradient descent steps, which allow us to discover the clusters, then we can guarantee convergence and at the same time benefit from the efficiency of coordinate descent.

The solver is illustrated for a two-dimensional SLOPE problem in Figure 4. The orange cross marks the optimum. Dashed lines indicate PGD steps and solid lines CD steps. Each dot marks a complete epoch, which may correspond to only a single coefficient update for the CD and hybrid solvers if the coefficients flip order. The CD algorithm converges quickly but is stuck after the third epoch. The hybrid and PGD algorithms, meanwhile, reach convergence after 67 and 156 epochs respectively.

The final paper of the thesis is a working paper in which we tackle the issue of normalization of binary features. Normalization is necessary in order to put the features on the same scale when dealing with regularized methods. What “same scale” means, however, however, is not clear, yet has been met mostly with neglect in the literature. We think that this is both surprising and problematic given the almost universal use of normalization in regularized methods and the apparent and large effects it has on the solution paths.

In our paper, we begin to bridge this knowledge gap by studying normalization for the lasso and ridge regression when they are used on binary features (features that only contain values 0 or 1) or mix of binary and normally distributed features. What we find is that there is a large effect of normalization with respect to the class balance of the features: the proportion of ones to zeros (or vice versa). Both the lasso and the ridge estimators turn out to be sensitive to this class balance and, depending on the type of normalization used, have trouble recovering effects that are associated with binary features as long as their class balance is severe enough Figure 4.

I will offer more details on this paper once work on it has been completed, but I think the results are interesting and that this field is ripe for further exploration.

Bogdan, Małgorzata, Ewout van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Candès. 2015. “SLOPE – Adaptive Variable Selection via Convex Optimization.” *The Annals of Applied Statistics* 9 (3): 1103–40. https://doi.org/10.1214/15-AOAS842.

El Ghaoui, Laurent, Vivian Viallon, and Tarek Rabbani. 2010. “Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems.” *arXiv:1009.4219 [Cs, Math]*, September. https://arxiv.org/abs/1009.4219.

Larsson, Johan. 2021. “Look-Ahead Screening Rules for the Lasso.” In *22nd European Young Statisticians Meeting - Proceedings*, edited by Andreas Makridis, Fotios S. Milienos, Panagiotis Papastamoulis, Christina Parpoula, and Athanasios Rakitzis, 61–65. Athens, Greece: Panteion university of social and political sciences. https://www.eysm2021.panteion.gr/files/Proceedings_EYSM_2021.pdf.

———. 2024. “Optimization and Algorithms in Sparse Regression: Screening Rules, Coordinate Descent, and Normalization.” PhD thesis, Lund, Sweden: Department of Statistics, Lund University. https://lup.lub.lu.se/record/0b9c97e8-5f65-43eb-9f7a-c4f237568370.

Larsson, Johan, Małgorzata Bogdan, and Jonas Wallin. 2020. “The Strong Screening Rule for SLOPE.” In *Advances in Neural Information Processing Systems 33*, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, 33:14592–603. Virtual: Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2020/hash/a7d8ae4569120b5bec12e7b6e9648b86-Abstract.html.

Larsson, Johan, Quentin Klopfenstein, Mathurin Massias, and Jonas Wallin. 2023. “Coordinate Descent for SLOPE.” In *Proceedings of the 26th International Conference on Artificial Intelligence and Statistics*, edited by Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, 206:4802–21. Proceedings of Machine Learning Research. Valencia, Spain: PMLR. https://proceedings.mlr.press/v206/larsson23a.html.

Larsson, Johan, and Jonas Wallin. 2022. “The Hessian Screening Rule.” In *Advances in Neural Information Processing Systems 35*, edited by Sanmi Koyejo, Sidahmed Mohamed, Alekh Agarwal, Danielle Belgrave, Kyunghyun Cho, and Alice Oh, 35:15823–35. New Orleans, USA: Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2022/hash/65a925049647eab0aa06a9faf1cd470b-Abstract-Conference.html.

Moreau, Thomas, Mathurin Massias, Alexandre Gramfort, Pierre Ablin, Pierre-Antoine Bannier, Benjamin Charlier, Mathieu Dagréou, et al. 2022. “Benchopt: Reproducible, Efficient and Collaborative Optimization Benchmarks.” In *Advances in Neural Information Processing Systems 35*, edited by S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, 35:25404–21. New Orleans, USA: Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/hash/a30769d9b62c9b94b72e21e0ca73f338-Abstract-Conference.html.

Ndiaye, Eugene, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. “Gap Safe Screening Rules for Sparsity Enforcing Penalties.” *Journal of Machine Learning Research* 18 (128): 1–33. https://jmlr.org/papers/v18/16-577.html.

Tibshirani, Robert, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Taylor, and Ryan J. Tibshirani. 2012. “Strong Rules for Discarding Predictors in Lasso-Type Problems.” *Journal of the Royal Statistical Society. Series B: Statistical Methodology* 74 (2): 245–66. https://doi.org/c4bb85.

This is typically the case when we are fitting a regularization path. We start with a penalty that’s large enough to make every coefficient zero and then progressively increase it.↩︎

The regularization path starts at the point where all of the model’s coefficients are zero and proceed until they are almost not penalized at all.↩︎

I’ve stolen the idea of the package from the R and Python packages here and pyprojroot, which are package that I have used frequently in the past to simplify file referencing for research projects, particularly when referencing data files or creating plot files in a project.

The package is on the general Julia registry, so it can be installed in Julia by calling

`]add ProjectRoot`

The package is designed to be light on dependencies and carries only a single exported macro, `@projectroot`

. And its usage is simple. Consider the following simple directory structure.

```
MyProject
├── scripts
│ └── A.jl
├── src
│ └── B.jl
└── Project.toml
```

If you want to refer to a file, say `src/B.jl`

, you simply need to use

`@projectroot("src", "B.jl")`

anywhere in your project, for instance in `scripts/A.jl`

.

The `@projectroot`

macro fetches the file from where it is called and then recursively searches upwards in the file hierarchy until it finds one of the following:

- A
`.projectroot`

file - A
`Project.toml`

file - A
`JuliaProject.toml`

file - A
`Manifest.toml`

file - A
`.git`

folder - An
`.svn`

folder

The search terminates when it finds one of these files or when you reach the root of the file system. And this is what `@projectroot`

returns.

String interpolation is also supported, so you can use

```
file = "B.jl"
@projectroot("src") * "/$(file)"
```

and so on.

Calling `@projectroot`

from the REPL uses the same logic as above, but the search starts from the current working directory instead.

There is already similar functionality in the excellent DrWatson package. But I generally prescribe to the Unix philosophy (Doug McIlroy):

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.

So if you just need a lightweight package in the same spirit as `here`

and `pyprojroot`

, then `ProjectRoot`

might just be the right tool for you.

As always, I am happy for any kind of contribution. This is my first Julia package and I still haven’t wrapped my head around all the intricacies of Julia and its package ecosystem. So if you have any suggestions, please let me know. The source code is stored in the GitHub repository at jolars/ProjectRoot.jl and you can find the documentation for the latest stable version here.

SLOPE (Bogdan et al. 2015) stands for sorted L1 penalized estimation and is a generalization of OSCAR (Bondell and Reich 2008). As the name suggests, SLOPE is a type of -regularization. More specifically, SLOPE fits generalized linear models regularized with the sorted norm. The objective in SLOPE is

where is typically the log-likelihood of some model in the family of generalized linear models and

is the sorted norm.

Some people will note that this penalty is a generalization of the standard norm penalty^{1}. As such, SLOPE is a type of sparse regression—just like the lasso. Unlike the lasso, however, SLOPE gracefully handles correlated features. Whereas the lasso often discards all but a few among a set of correlated features (Jia and Yu 2010), SLOPE instead *clusters* such features together by setting such clusters to have the same coefficient in absolut value.

SLOPE 0.2.0 is a new verison of the R package SLOPE featuring a range of improvements over the previous package. If you are completely new to the package, please start with the introductory vignette.

Previously, SLOPE only features ordinary least-squares regression. Now the package features logistic, Poisson, and multinomial regression on top of that. Just as in other similar packages, this is enabled simply by setting `family = "binomial"`

for logistic regression, for instance.

```
library(SLOPE)
fit <- SLOPE(wine$x, wine$y, family = "multinomial")
```

By default, SLOPE now fits a full regularization path instead of only a single penalty sequence at once. This behavior is now analogous with the default behavior in glmnet.

`plot(fit)`

The package now uses predictor screening rules to vastly improve performance in the domain. Screening rules are part of what makes other related packages such as glmnet so efficient. In SLOPE, we use a variant of the strong screening rules for the lasso (Tibshirani et al. 2012).

```
xy <- SLOPE:::randomProblem(100, 1000)
system.time({SLOPE(xy$x, xy$y, screen = TRUE)})
```

```
user system elapsed
1.198 0.004 0.159
```

`system.time({SLOPE(xy$x, xy$y, screen = FALSE)})`

```
user system elapsed
2.781 0.006 0.364
```

There is now a function `trainSLOPE()`

, which can be used to run cross-validation for optimal selection of `sigma`

and `q`

. Here, we run 8-fold cross-validation repeated 5 times.

```
# 8-fold cross-validation repeated 5 times
tune <- trainSLOPE(
subset(mtcars, select = c("mpg", "drat", "wt")),
mtcars$hp,
q = c(0.1, 0.2),
number = 8,
repeats = 5
)
plot(tune)
```

In addition, the package now also features a function `caretSLOPE()`

that can be used via the excellent caret package, which enables a swath of resampling methods and comparisons.

All of the performance-critical code for SLOPE has been rewritten in C++. In addition, the package now features an ADMM solver for `family = "gaussian"`

, enabled by setting `solver = "admm"`

in the call to `SLOPE()`

. Preliminary testing shows that this solver is faster for many designs, particularly when there is high correlation among predictors.

SLOPE now also allows sparse design matrcies of classes from the Matrix package.

For a full list of changes, please see the changelog.

Bogdan, Małgorzata, Ewout van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Candès. 2015. “SLOPE – Adaptive Variable Selection via Convex Optimization.” *The Annals of Applied Statistics* 9 (3): 1103–40. https://doi.org/10.1214/15-AOAS842.

Bondell, Howard D., and Brian J. Reich. 2008. “Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR.” *Biometrics* 64 (1): 115–23. https://doi.org/10.1111/j.1541-0420.2007.00843.x.

Jia, J., and B. Yu. 2010. “On Model Selection Consistency of the Elastic Net When p n.” *Statistica Sinica* 20 (2): 595–611.

Tibshirani, Robert, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Taylor, and Ryan J. Tibshirani. 2012. “Strong Rules for Discarding Predictors in Lasso-Type Problems.” *Journal of the Royal Statistical Society. Series B: Statistical Methodology* 74 (2): 245–66. https://doi.org/c4bb85.

Simply set for all and you get the lasso penalty.↩︎

The purpose of my R package eulerr is to fit and *visualize* Euler diagrams. Besides the various intricacies involved in fitting the diagrams, there are many interesting problems involved in their visualization. One of these is the labeling of the overlaps.

Naturally, simply positioning the labels at the shapes’ centers fails more often than not. Nevertheless, this stategy is employed by **venneuler**, for instance, and the plots usually demand manual tuning.

```
# an example set combination
s <- c(
"SE" = 13,
"Treat" = 28,
"Anti-CCP" = 101,
"DAS28" = 91,
"SE&Treat" = 1,
"SE&DAS28" = 14,
"Treat&Anti-CCP" = 6,
"SE&Anti-CCP&DAS28" = 1
)
library(venneuler, quietly = TRUE)
fit_venneuler <- venneuler(s)
plot(fit_venneuler)
```

Up til now, I solved this in **eulerr** by, for each overlap, filling one of the involved shapes (circles or ellipses) with points and then numerically optimizing the location of the point using a Nelder–Mead optimizer. However, given that the solution to finding the distance between a point and an ellipse—at least one that is rotated—itself requires a numerical solution (Eberly 2013), this procedure turned out to be quite inefficient.

R has powerful functionality for plotting in general, but lacks capabilities for drawing ellipses using curves. High-resolution polygons are thankfully a readily available remedy for this and have since several version back been used also in **eulerr**.

The upside of using polygons, however, are that they are usually much easier, even if sometimes inefficient, to work with. For instance, they make constructing separate shapes for each overlap a breeze using the polyclip package (Johnson and Baddeley 2018).

And because basically all shapes in digital maps are polygons, there happens to exist a multitude of other useful tools to deal with a wide variety of tasks related to polygons. One of these turned out to be precisely what I needed: polylabel (Mapbox [2016] 2018) from the Mapbox suite. Because the details of the library have already been explained elsewhere I will spare you the details, but briefly put it uses quadtree binning to divide the polygon into square bins, pruning away dead-ends. It is inefficient and will, according to the authors, find a point that is “guaranteed to be a global optimum within the given precision”.

Because it appeared to be such a valuable tool for R users, I decided to create a wrapper for the c++ header for polylabel and bundle it as a package for R users.

```
# install.packages("polylabelr")
library(polylabelr)
# a concave polygon with a hole
x <- c(0, 6, 3, 9, 10, 12, 4, 0, NA, 2, 5, 3)
y <- c(0, 0, 1, 3, 1, 5, 3, 0, NA, 1, 2, 2)
# locate the pole of inaccessibility
p <- poi(x, y, precision = 0.01)
plot.new()
plot.window(
range(x, na.rm = TRUE),
range(y, na.rm = TRUE),
asp = 1
)
polypath(x, y, col = "grey90", rule = "evenodd")
points(p, cex = 2, pch = 16)
```

The package is availabe on cran, the source code is located at https://github.com/jolars/polylabelr and is documented at https://jolars.github.io/polylabelr/.

To come back around to where we started at, **polylabelr** has now been employed in the development branch of **eulerr** where it is used to quickly and appropriately figure out locations for the labels of the diagram.

```
library(eulerr)
plot(euler(s))
```

Eberly, David. 2013. “Distance from a Point to an Ellipse, an Ellipsoid, or a Hyperellipsoid.” Geometric Tools. June 28, 2013. https://www.geometrictools.com/Documentation/DistancePointEllipseEllipsoid.pdf.

Johnson, Angus, and Adrian Baddeley. 2018. “Polyclip: Polygon Clipping.” https://CRAN.R-project.org/package=polyclip.

Mapbox. (2016) 2018. “A Fast Algorithm for Finding the Pole of Inaccessibility of a Polygon (in JavaScript and C++): Mapbox/Polylabel.” Mapbox. https://github.com/mapbox/polylabel.

`n`

colors so that the minimal pairwise distance among them is maximized, that is, we want the most similar pair of colors to be as dissimilar as possible.
This turns out to be much less trivial that one would suspect, which posts on Computational Science, MATLAB Central, Stack Overflow, and and Computer Science can attest to.

Up til now, qualpalr solved this problem with a greedy approach. If we, for instance, want to find `n`

points we did the following.

```
M <- Compute a distance matrix of all points in the sample
X <- Select the two most distant points from M
for i = 3:n
X(i) <- Select point in M that maximize the
mindistance to all points in X
```

In R, this code looked like this (in two dimensions):

```
set.seed(1)
# find n points
n <- 3
mat <- matrix(runif(100), ncol = 2)
dmat <- as.matrix(stats::dist(mat))
ind <- integer(n)
ind[1:2] <- as.vector(arrayInd(which.max(dmat), .dim = dim(dmat)))
for (i in 3:n) {
mm <- dmat[ind, -ind, drop = FALSE]
k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])
ind[i] <- as.numeric(dimnames(mm)[[2]][k])
}
plot(mat, asp = 1, xlab = "", ylab = "")
plot(mat, asp = 1, xlab = "", ylab = "")
points(mat[ind, ], pch = 19)
text(mat[ind, ], adj = c(0, -1.5))
```

While this greedy procedure is fast and works well for large values of `n`

it is quite inefficient in the example above. It is plain to see that there are other subsets of 3 points that would have a larger minimum distance but because we base our selection on the previous 2 points that were selected to be maximally distant, the algorithm has to pick a suboptimal third point. The minimum distance in our example is 0.7641338.

The solution I came up with is based on a solution from Schlomer et al. (Schlömer, Heck, and Deussen 2011) who devised of an algorithm to partition a sets of points into subsets whilst maximizing the minimal distance. They used delaunay triangulations but I decided to simply use the distance matrix instead. The algorithm works as follows.

```
M <- Compute a distance matrix of all points in the sample
S <- Sample n points randomly from M
repeat
for i = 1:n
M <- Add S(i) back into M
S(i) <- Find point in M\S with max mindistance to any point in S
until M did not change
```

Iteratively, we put one point from our candidate subset (S) back into the original se (M) and check all distances between the points in S to those in M to find the point with the highest minimum distance. Rinse and repeat until we are only putting back the same points we started the loop with, which always happens. Let’s see how this works on the same data set we used above.

```
r <- sample.int(nrow(dmat), n)
repeat {
r_old <- r
for (i in 1:n) {
mm <- dmat[r[-i], -r[-i], drop = FALSE]
k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])
r[i] <- as.numeric(dimnames(mm)[[2]][k])
}
if (identical(r_old, r)) break
}
plot(mat, asp = 1, xlab = "", ylab = "")
plot(mat, asp = 1, xlab = "", ylab = "")
points(mat[r, ], pch = 19)
text(mat[r, ], adj = c(0, -1.5))
```

Here, we end up with a minimum distance of 0.8619587. In qualpalr, this means that we now achieve slightly more distinct colors.

The new algorithm is slightly slower than the old, greedy approach and slightly more verbose

```
f_greedy <- function(data, n) {
dmat <- as.matrix(stats::dist(data))
ind <- integer(n)
ind[1:2] <- as.vector(arrayInd(which.max(dmat), .dim = dim(dmat)))
for (i in 3:n) {
mm <- dmat[ind, -ind, drop = FALSE]
k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])
ind[i] <- as.numeric(dimnames(mm)[[2]][k])
}
ind
}
f_new <- function(dat, n) {
dmat <- as.matrix(stats::dist(data))
r <- sample.int(nrow(dmat), n)
repeat {
r_old <- r
for (i in 1:n) {
mm <- dmat[r[-i], -r[-i], drop = FALSE]
k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])
r[i] <- as.numeric(dimnames(mm)[[2]][k])
}
if (identical(r_old, r)) return(r)
}
}
```

```
n <- 5
data <- matrix(runif(900), ncol = 3)
microbenchmark::microbenchmark(
f_greedy(data, n),
f_new(data, n),
times = 1000L
)
```

```
Unit: microseconds
expr min lq mean median uq max neval
f_greedy(data, n) 799.503 887.881 1158.959 990.9285 1143.117 39747.68 1000
f_new(data, n) 1066.954 1508.136 1893.722 1710.8860 2002.331 13596.49 1000
cld
a
b
```

The newest development version of qualpalr now uses this updated algorithm which has also been generalized and included as a new function in my R package euclidr called `farthest_points`

.

Schlömer, Thomas, Daniel Heck, and Oliver Deussen. 2011. “Farthest-Point Optimized Point Sets with Maximized Minimum Distance.” In, 135. ACM Press. https://doi.org/bpmnsh.

R features a number of packages that produce Euler and/or Venn diagrams; some of the more prominent ones (on CRAN) are

- eVenn,
- VennDiagram,
- venn,
- colorfulVennPlot, and
- venneuler.

The last of these (venneuler) serves as the primary inspiration for this package, along with the refinements that Ben Fredrickson has presented on his blog and made available in his javascript venn.js.

venneuler, however, is written in java, preventing R users from browsing the source code (unless they are also literate in java) or contributing. Furthermore, venneuler is known to produce imperfect output for set configurations that have perfect solutions. Consider, for instance, the following example in which the intersection between `A`

and `B`

is unwanted.

```
library(venneuler, quietly = TRUE)
venn_fit <- venneuler(c(A = 75, B = 50, "A&B" = 0))
plot(venn_fit)
```

eulerr is based on the improvements to **venneuler** that Ben Fredrickson introcued with **venn.js** but has been coded from scratch, uses different optimizers, and returns the residuals and stress statistic that venneuler features.

Currently, it is possible to provide input to `eulerr`

as either

- a named numeric vector or
- a matrix of logicals with columns representing sets and rows the set relationships for each observation.

```
library(eulerr)
# Input in the form of a named numeric vector
fit1 <- euler(c("A" = 25, "B" = 5, "C" = 5,
"A&B" = 5, "A&C" = 5, "B&C" = 3,
"A&B&C" = 3))
# Input as a matrix of logicals
set.seed(1)
mat <-
cbind(
A = sample(c(TRUE, TRUE, FALSE), size = 50, replace = TRUE),
B = sample(c(TRUE, FALSE), size = 50, replace = TRUE),
C = sample(c(TRUE, FALSE, FALSE, FALSE), size = 50, replace = TRUE)
)
fit2 <- euler(mat)
```

We inspect our results by printing the eulerr object

`fit2`

```
original fitted residuals regionError
A 13 13 0 0.008
B 4 4 0 0.002
C 0 0 0 0.000
A&B 17 17 0 0.010
A&C 5 5 0 0.003
B&C 1 0 1 0.024
A&B&C 2 2 0 0.001
diagError: 0.024
stress: 0.002
```

or directly access and plot the residuals.

```
# Cleveland dot plot of the residuals
dotchart(resid(fit2))
abline(v = 0, lty = 3)
```

This shows us that the `A&B&C`

intersection is somewhat overrepresented in `fit2`

. Given that these residuals are on the scale of the original values, however, the residuals are arguably of little concern.

For an overall measure of the fit of the solution, we use the same stress statistic that Leland Wilkinson presented in his academic paper on venneuler (Wilkinson 2012), which is given by the sums of squared residuals divided by the total sums of squares:

We fetch it from the `stress`

attribute of the `eulerr`

object.

`fit2$stress`

`[1] 0.00198`

We can now be confident that eulerr provides a reasonable representation of our input. Were it otherwise, we would do best to stop here and look for another way to visualize our data. (I suggest the excellent UpSetR package.)

No we get to the fun part: plotting our diagram. This is easy, as well as highly customizable, with eulerr.

```
plot(fit2)
# Change fill colors, border type (remove) and fontface.
plot(
fit2,
fills = c("dodgerblue4", "plum2", "seashell2"),
edges = list(lty = 1:3),
labels = list(font = 2)
)
```

eulerr’s default color palette is taken from qualpalr – another package that I have developed – which uses color difference algorithms to generate distinct qualitative color palettes.

Details of the implementation will be left for a future vignette but almost completely resemble the approach documented here.

eulerr would not be possible without Ben Fredrickson’s work on venn.js or Leland Wilkinson’s venneuler.

Wilkinson, L. 2012. “Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams.” *IEEE Transactions on Visualization and Computer Graphics* 18 (2): 321–31. https://doi.org/10.1109/TVCG.2011.56.

With the advent of colorbrewer there now exists good options to generate color palettes for sequential, diverging, and qualitative data. In R, these palettes can be accessed via the popular RColorBrewer package. Those palettes, however, are limited to a fixed number of colors. This isn’t much of a problem for sequential of diverging data since we can interpolate colors to any range we desire:

```
pal <- RColorBrewer::brewer.pal(4, "PuBuGn")
color_ramp <- colorRampPalette(pal, space = "Lab")
```

There is not, however, an analogue for qualitative color palettes that will get you beyond the limits of 8–12 colors of colorbrewer’s qualitative color palettes. There is also no customization in colorbrewer. Other R packages, such as colorspace offer this, but they are primarily adapted to sequential and diverging data – not qualitative data.

This is where qualpalr comes in. qualpalr provides the user with a convenient way of generating distinct qualitative color palettes, primarily for use in R graphics. Given `n`

(the number of colors to generate), along with a subset in the hsl color space (a cylindrical representation of the RGB color space) `qualpalr`

attempts to find the `n`

colors in the provided color subspace that *maximize the smallest pairwise color difference*. This is done by projecting the color subset from the HSL color space to the DIN99d space. DIN99d is (approximately) perceptually uniform, that is, the euclidean distance between two colors in the space is proportional to their perceived difference.

`qualpalr`

relies on one basic function, `qualpal()`

, which takes as its input `n`

(the number of colors to generate) and `colorspace`

, which can be either

- a list of numeric vectors
`h`

(hue from -360 to 360),`s`

(saturation from 0 to 1), and`l`

(lightness from 0 to 1), all of length 2, specifying a min and max, or - a character vector specifying one of the predefined color subspaces, which at the time of writing are
*pretty*,*pretty_dark*,*rainbow*, and*pastels*.

```
library(qualpalr)
pal <- qualpal(
n = 5,
list(
h = c(0, 360),
s = c(0.4, 0.6),
l = c(0.5, 0.85)
)
)
# Adapt the color space to deuteranopia
pal <- qualpal(n = 5, colorspace = "pretty", cvd = "deutan")
```

The resulting object, `pal`

, is a list with several color tables and a distance matrix based on the din99d color difference formula.

`pal`

```
----------------------------------------
Colors in the HSL color space
Hue Saturation Lightness
#73CA6F 117 0.46 0.61
#D37DAD 327 0.50 0.66
#C6DBE8 203 0.42 0.84
#6C7DCC 229 0.48 0.61
#D0A373 31 0.50 0.63
----------------------------------------
DIN99d color difference distance matrix
#73CA6F #D37DAD #C6DBE8 #6C7DCC
#D37DAD 28
#C6DBE8 19 21
#6C7DCC 27 19 19
#D0A373 19 18 20 25
```

Methods for `pairs`

and `plot`

have been written for `qualpal`

objects to help visualize the results.

`plot(pal)`

`pairs(pal, colorspace = "DIN99d", asp = 1)`

The colors are normally used in R by fetching the `hex`

attribute of the palette. And so it is straightforward to use the output to, say, color the provinces of France (Figure 3).

```
library(maps)
map("france", fill = TRUE, col = pal$hex, mar = c(0, 0, 0, 0))
```

`qualpal`

begins by generating a point cloud out of the HSL color subspace provided by the user, using a quasi-random torus sequence from randtoolbox. Here is the color subset in HSL with settings `h = c(-200, 120), s = c(0.3, 0.8), l = c(0.4, 0.9)`

.

The function then proceeds by projecting these colors into the sRGB space (Figure 5).

It then continues by projecting the colors, first into the XYZ space, then CIELab (not shown here), and then finally the DIN99d space (Figure 6).

The DIN99d color space (Cui et al. 2002) is a euclidean, perceptually uniform color space. This means that the difference between two colors is equal to the euclidean distance between them. We take advantage of this by computing a distance matrix on all the colors in the subset, finding their pairwise color differences. We then apply a power transformation (Huang et al. 2015) to fine tune these differences.

To select the `n`

colors that the user wanted, we proceed greedily: first, we find the two most distant points, then we find the third point that maximizes the minimum distance to the previously selected points. This is repeated until `n`

points are selected. These points are then returned to the user; below is an example using `n = 5.`

At the time of writing, qualpalr only works in the sRGB color space with the CIE Standard Illuminant D65 reference white.

The greedy search to find distinct colors is crude. Particularly when searching for few colors, the greedy algorithm will lead to sub-optimal results. Other solutions to finding points that maximize the smallest pairwise distance among them are welcome.

Bruce Lindbloom’s webpage has been instrumental in making qualpalr. Also thanks to i want hue, which inspired me to make qualpalr.

Cui, G., M. R. Luo, B. Rigg, G. Roesler, and K. Witt. 2002. “Uniform Colour Spaces Based on the DIN99 Colour-Difference Formula.” *Color Research & Application* 27 (4): 282–90. https://doi.org/cz7764.

Huang, Min, Guihua Cui, Manuel Melgosa, Manuel Sánchez-Marañón, Changjun Li, M. Ronnier Luo, and Haoxue Liu. 2015. “Power Functions Improving the Performance of Color-Difference Formulas.” *Optics Express* 23 (1): 597. https://doi.org/gcsk6f.