The simplest way to use arrange is to specify a column name. The arrange function uses pandas.sort_values under the hood, and arranges rows in ascending order.
For example, the code below arranges the rows from least to greatest horsepower (hp).
# simple arrange of 1 varsmall_mtcars >> arrange(_.hp)
cyl
mpg
hp
18
4
30.4
52
7
4
24.4
62
...
...
...
...
28
8
15.8
264
30
8
15.0
335
32 rows × 3 columns
Sort in descending order
If you add a - before a column or expression, arrange will sort the rows in descending order. This applies to all types of columns, including arrays of strings and categories!
small_mtcars >> arrange(-_.hp)
cyl
mpg
hp
30
8
15.0
335
28
8
15.8
264
...
...
...
...
7
4
24.4
62
18
4
30.4
52
32 rows × 3 columns
Arrange by multiple variables
When arrange receives multiple arguments, it sorts so that the one specified first changes the slowest, followed by the second, and so on.
small_mtcars >> arrange(_.cyl, -_.mpg)
cyl
mpg
hp
19
4
33.9
65
17
4
32.4
66
...
...
...
...
14
8
10.4
205
15
8
10.4
215
32 rows × 3 columns
Notice that in the result above, cyl values are sorted first. In other words, all of the 4’s are bunched together, with mpg sorted in descending order within each bunch.
Using expressions
You can also arrange the rows of your data using more complex expressions, similar to those you would use in a mutate.
For example, the code below sorts by horsepower (hp) per cylinder (cyl).
small_mtcars >> arrange(_.hp / _.cyl)
cyl
mpg
hp
18
4
30.4
52
7
4
24.4
62
...
...
...
...
28
8
15.8
264
30
8
15.0
335
32 rows × 3 columns
Categorical series behavior
Arrange uses pd.sort_values() behind the scenes, which sorts pd.Categorical series by their category order.
ser = pd.Categorical(["a", "z"], categories=["z", "a"])ser
['a', 'z']
Categories (2, object): ['z', 'a']
ser.sort_values()
['z', 'a']
Categories (2, object): ['z', 'a']
Siuba contains a submodule called forcats that make it easy to change the category order.
from siuba.dply.forcats import fct_rev# reverse the category orderfct_rev(ser)