from siuba import _, arrange, select
from siuba.data import mtcars
small_mtcars = mtcars >> select(_.cyl, _.mpg, _.hp)
small_mtcars| cyl | mpg | hp | |
|---|---|---|---|
| 0 | 6 | 21.0 | 110 |
| 1 | 6 | 21.0 | 110 |
| ... | ... | ... | ... |
| 30 | 8 | 15.0 | 335 |
| 31 | 4 | 21.4 | 109 |
32 rows × 3 columns
This function lets you to arrange the rows of your data, through two steps…
Below, we’ll illustrate this function with a single variable, multiple variables, and more general expressions.
from siuba import _, arrange, select
from siuba.data import mtcars
small_mtcars = mtcars >> select(_.cyl, _.mpg, _.hp)
small_mtcars| cyl | mpg | hp | |
|---|---|---|---|
| 0 | 6 | 21.0 | 110 |
| 1 | 6 | 21.0 | 110 |
| ... | ... | ... | ... |
| 30 | 8 | 15.0 | 335 |
| 31 | 4 | 21.4 | 109 |
32 rows × 3 columns
The simplest way to use arrange is to specify a column name. The arrange function uses pandas.sort_values under the hood, and arranges rows in ascending order.
For example, the code below arranges the rows from least to greatest horsepower (hp).
If you add a - before a column or expression, arrange will sort the rows in descending order. This applies to all types of columns, including arrays of strings and categories!
When arrange receives multiple arguments, it sorts so that the one specified first changes the slowest, followed by the second, and so on.
| cyl | mpg | hp | |
|---|---|---|---|
| 19 | 4 | 33.9 | 65 |
| 17 | 4 | 32.4 | 66 |
| ... | ... | ... | ... |
| 14 | 8 | 10.4 | 205 |
| 15 | 8 | 10.4 | 215 |
32 rows × 3 columns
Notice that in the result above, cyl values are sorted first. In other words, all of the 4’s are bunched together, with mpg sorted in descending order within each bunch.
You can also arrange the rows of your data using more complex expressions, similar to those you would use in a mutate.
For example, the code below sorts by horsepower (hp) per cylinder (cyl).
Arrange uses pd.sort_values() behind the scenes, which sorts pd.Categorical series by their category order.
['a', 'z']
Categories (2, object): ['z', 'a']
Siuba contains a submodule called forcats that make it easy to change the category order.
['a', 'z']
Categories (2, object): ['a', 'z']
You can learn more in the siuba forcats docs.