complete
complete(__data, *args, *, fill=None, explicit=True)
Add rows to fill in missing combinations in the data.
This is a wrapper around expand(), right_join(), along with filling NAs.
Parameters
Name | Type | Description | Default |
---|---|---|---|
__data |
The input data. | required | |
*args |
Columns to cross and expand. | () |
|
fill |
A dictionary specifying what to use for missing values in each column. If a column is not specified, missing values are left as is. | None |
|
explicit |
Should both NAs created by the complete and pre-existing NAs be filled by the fill argument? Defaults to True (filling both). When set to False, it will only fill newly created NAs. | True |
Examples
>>> import pandas as pd
>>> from siuba import _, expand, count, anti_join, right_join
>>> df = pd.DataFrame({"x": [1, 2, 2], "y": ["a", "a", "b"], "z": [8, 9, None]})
>>> df
x y z0 1 a 8.0
1 2 a 9.0
2 2 b NaN
>>> df >> complete(_.x, _.y)
x y z0 1 a 8.0
1 1 b NaN
2 2 a 9.0
3 2 b NaN
Use the fill argument to replace missing values:
>>> df >> complete(_.x, _.y, fill={"z": 999})
x y z0 1 a 8.0
1 1 b 999.0
2 2 a 9.0
3 2 b 999.0
A common use of complete is to make zero counts explicit (e.g. for charting):
>>> df >> count(_.x, _.y) >> complete(_.x, _.y, fill={"n": 0})
x y n0 1 a 1.0
1 1 b 0.0
2 2 a 1.0
3 2 b 1.0
Use explicit=False to only fill the NaNs introduced by complete (implicit missing), and not those already in the original data (explicit missing):
>>> df >> complete(_.x, _.y, fill={"z": 999}, explicit=False)
x y z0 1 a 8.0
1 1 b 999.0
2 2 a 9.0
3 2 b NaN