Arrange rows

This function lets you to arrange the rows of your data, through two steps…

Below, we’ll illustrate this function with a single variable, multiple variables, and more general expressions.

from siuba import _, arrange, select
from siuba.data import mtcars

small_mtcars = mtcars >> select(_.cyl, _.mpg, _.hp)

small_mtcars
cyl mpg hp
0 6 21.0 110
1 6 21.0 110
... ... ... ...
30 8 15.0 335
31 4 21.4 109

32 rows × 3 columns

Basics

The simplest way to use arrange is to specify a column name. The arrange function uses pandas.sort_values under the hood, and arranges rows in ascending order.

For example, the code below arranges the rows from least to greatest horsepower (hp).

# simple arrange of 1 var
small_mtcars >> arrange(_.hp)
cyl mpg hp
18 4 30.4 52
7 4 24.4 62
... ... ... ...
28 8 15.8 264
30 8 15.0 335

32 rows × 3 columns

Sort in descending order

If you add a - before a column or expression, arrange will sort the rows in descending order. This applies to all types of columns, including arrays of strings and categories!

small_mtcars >> arrange(-_.hp)
cyl mpg hp
30 8 15.0 335
28 8 15.8 264
... ... ... ...
7 4 24.4 62
18 4 30.4 52

32 rows × 3 columns

Arrange by multiple variables

When arrange receives multiple arguments, it sorts so that the one specified first changes the slowest, followed by the second, and so on.

small_mtcars >> arrange(_.cyl, -_.mpg)
cyl mpg hp
19 4 33.9 65
17 4 32.4 66
... ... ... ...
14 8 10.4 205
15 8 10.4 215

32 rows × 3 columns

Notice that in the result above, cyl values are sorted first. In other words, all of the 4’s are bunched together, with mpg sorted in descending order within each bunch.

Using expressions

You can also arrange the rows of your data using more complex expressions, similar to those you would use in a mutate.

For example, the code below sorts by horsepower (hp) per cylinder (cyl).

small_mtcars >> arrange(_.hp / _.cyl)
cyl mpg hp
18 4 30.4 52
7 4 24.4 62
... ... ... ...
28 8 15.8 264
30 8 15.0 335

32 rows × 3 columns

Categorical series behavior

Arrange uses pd.sort_values() behind the scenes, which sorts pd.Categorical series by their category order.

ser = pd.Categorical(["a", "z"], categories=["z", "a"])

ser
['a', 'z']
Categories (2, object): ['z', 'a']
ser.sort_values()
['z', 'a']
Categories (2, object): ['z', 'a']

Siuba contains a submodule called forcats that make it easy to change the category order.

from siuba.dply.forcats import fct_rev

# reverse the category order
fct_rev(ser)
['a', 'z']
Categories (2, object): ['a', 'z']

You can learn more in the siuba forcats docs.