separate
separate(__data, col, into, sep='\[^a-zA-Z0-9\]', remove=True, convert=False, extra='warn', fill='warn')
Split col into len(into) piece. Return DataFrame with a column added for each piece.
Parameters
Name | Type | Description | Default |
---|---|---|---|
__data |
a DataFrame. | required | |
col |
name of column to split (either string, or siu expression). | required | |
into |
names of resulting columns holding each entry in split. | required | |
sep |
regular expression used to split col. Passed to col.str.split method. | '[^a-zA-Z0-9]' |
|
remove |
whether to remove col from the returned DataFrame. | True |
|
convert |
whether to attempt to convert the split columns to numerics. | False |
|
extra |
what to do when more splits than into names. One of (“warn”, “drop” or “merge”). “warn” produces a warning; “drop” and “merge” currently not implemented. | 'warn' |
|
fill |
what to do when fewer splits than into names. Currently not implemented. | 'warn' |
Examples
>>> import pandas as pd
>>> from siuba import separate
>>> df = pd.DataFrame({"label": ["S1-1", "S2-2"]})
Split into two columns:
>>> separate(df, "label", into = ["season", "episode"])
season episode0 S1 1
1 S2 2
Split, and try to convert columns to numerics:
>>> separate(df, "label", into = ["season", "episode"], convert = True)
season episode0 S1 1
1 S2 2