[1]:

import pandas as pd
pd.set_option("display.max_rows", 5)


Arrange¶

This function lets you to arrange the rows of your data, through two steps…

• choosing columns to arrange by

• specifying an order (ascending or descending)

Below, we’ll illustrate this function with a single variable, multiple variables, and more general expressions.

[2]:

from siuba import _, arrange, select
from siuba.data import mtcars

small_mtcars = mtcars >> select(_.cyl, _.mpg, _.hp)

small_mtcars

[2]:

cyl mpg hp
0 6 21.0 110
1 6 21.0 110
... ... ... ...
30 8 15.0 335
31 4 21.4 109

32 rows × 3 columns

Arranging rows by a single variable¶

The simplest way to use arrange is to specify a column name. The arrange function uses pandas.sort_values under the hood, and arranges rows in ascending order.

For example, the code below arranges the rows from least to greatest horsepower (hp).

[3]:

# simple arrange of 1 var
small_mtcars >> arrange(_.hp)

[3]:

cyl mpg hp
18 4 30.4 52
7 4 24.4 62
... ... ... ...
28 8 15.8 264
30 8 15.0 335

32 rows × 3 columns

If you add a - before a column or expression, arrange will sort the rows in descending order. This applies to all types of columns, including arrays of strings and categories!

[4]:

small_mtcars >> arrange(-_.hp)

[4]:

cyl mpg hp
30 8 15.0 335
28 8 15.8 264
... ... ... ...
7 4 24.4 62
18 4 30.4 52

32 rows × 3 columns

Arranging rows by multiple variables¶

When arrange receives multiple arguments, it sorts so that the one specified first changes the slowest, followed by the second, and so on.

[5]:

small_mtcars >> arrange(_.cyl, _.mpg)

[5]:

cyl mpg hp
31 4 21.4 109
20 4 21.5 97
... ... ... ...
4 8 18.7 175
24 8 19.2 175

32 rows × 3 columns

[6]:

small_mtcars >> arrange(_.cyl, -_.mpg)

[6]:

cyl mpg hp
19 4 33.9 65
17 4 32.4 66
... ... ... ...
14 8 10.4 205
15 8 10.4 215

32 rows × 3 columns

Expressions¶

You can also arrange the rows of your data using more complex expressions, similar to those you would use in a mutate.

For example, the code below sorts by horsepower (hp) per cylindar (cyl).

[7]:

small_mtcars >> arrange(_.hp / _.cyl)

[7]:

cyl mpg hp
18 4 30.4 52
7 4 24.4 62
... ... ... ...
28 8 15.8 264
30 8 15.0 335

32 rows × 3 columns

Arranging Categorical series¶

Note that when arranging a categorical series, it will be arranged in the order of its categories. For example, the DataFrame below consists of a category with three entries.

[8]:

df = pd.DataFrame({
"x_cat": pd.Categorical(["c", "b", "a"])
})

df

[8]:

x_cat
0 c
1 b
2 a

While the values of the category go from “c” to “a”, the default levels of a categorical are already sorted, so go from “a” to “c”. This can be seen in the very last line of output below.

[9]:

df.x_cat

[9]:

0    c
1    b
2    a
Name: x_cat, dtype: category
Categories (3, object): ['a', 'b', 'c']


Since pd.sort_values would sort the categorical according to the order listed under “Categories”, arrange does this also.

[10]:

df >> arrange(_.x_cat)

[10]:

x_cat
2 a
1 b
0 c

This means that if reorder the categories, the arrange will follow that reordering!

[11]:

from siuba.dply.forcats import fct_rev

df["rev_x_cat"] = fct_rev(df.x_cat)
df.rev_x_cat

[11]:

0    c
1    b
2    a
Name: rev_x_cat, dtype: category
Categories (3, object): ['c', 'b', 'a']

[12]:

df >> arrange(_.rev_x_cat)

[12]:

x_cat rev_x_cat
0 c c
1 b b
2 a a

Edit page on github here. Interactive version: