dply.vector
dply.vector
Functions that implement dplyr vector operations.
Functions
Name | Description |
---|---|
between | Return whether a value is between left and right (including either side). |
coalesce | Returns a copy of x, with NaN values filled in from *args. Ignores indexes. |
cumall | Return a same-length array. For each entry, indicates whether that entry and all previous are True-like. |
cumany | Return a same-length array. For each entry, indicates whether that entry or any previous are True-like. |
cume_dist | Return the cumulative distribution corresponding to each value in x. |
cummean | Return a same-length array, containing the cumulative mean. |
dense_rank | Return the dense rank. |
desc | Return array sorted in descending order. |
first | |
lag | Return an array with each value replaced by the previous (or further backward) value in the array. |
last | |
lead | Return an array with each value replaced by the next (or further forward) value in the array. |
min_rank | Return the min rank. See pd.Series.rank with method=“min” for details. |
n | Return the total number of elements in the array (or rows in a DataFrame). |
n_distinct | Return the total number of distinct (i.e. unique) elements in an array. |
na_if | Return a array like x, but with values in y replaced by NAs. |
near | TODO: Not Implemented |
nth | Return the nth entry of x. Similar to x\[n\] . |
ntile | TODO: Not Implemented |
percent_rank | Return the percent rank. |
row_number | Return the row number (position) for each value in x, beginning with 1. |
between
dply.vector.between(x, left, right, default=False)
Notes
This is a thin wrapper around pd.Series.between(left, right)
Examples
>>> between(pd.Series([1,2,3]), 0, 2)
0 True
1 True
2 False
bool dtype:
coalesce
dply.vector.coalesce(x, *args)
Returns a copy of x, with NaN values filled in from *args. Ignores indexes.
Arguments: x: a pandas Series object *args: other Series that are the same length as x, or a scalar
Examples: >>> x = pd.Series([1.1, None, None]) >>> abc = pd.Series([‘a’, ‘b’, None]) >>> xyz = pd.Series([‘x’, ‘y’, ‘z’]) >>> coalesce(x, abc) 0 1.1 1 b 2 None dtype: object
>>> coalesce(x, abc, xyz)
0 1.1
1 b
2 z
dtype: object
cumall
dply.vector.cumall(x)
Return a same-length array. For each entry, indicates whether that entry and all previous are True-like.
Examples
>>> cumall(pd.Series([True, False, False]))
0 True
1 False
2 False
bool dtype:
cumany
dply.vector.cumany(x)
Return a same-length array. For each entry, indicates whether that entry or any previous are True-like.
Examples
>>> cumany(pd.Series([False, True, False]))
0 False
1 True
2 True
bool dtype:
cume_dist
dply.vector.cume_dist(x, na_option='keep')
Return the cumulative distribution corresponding to each value in x.
This reflects the proportion of values that are less than or equal to each value.
cummean
dply.vector.cummean(x)
Return a same-length array, containing the cumulative mean.
dense_rank
dply.vector.dense_rank(x, na_option='keep')
Return the dense rank.
This method of ranking returns values ranging from 1 to the number of unique entries. Ties are all given the same ranking.
Examples
>>> dense_rank(pd.Series([1,3,3,5]))
0 1.0
1 2.0
2 2.0
3 3.0
dtype: float64
desc
dply.vector.desc(x)
Return array sorted in descending order.
first
dply.vector.first(x, order_by=None, default=None)
lag
dply.vector.lag(x, n=1, default=None)
Return an array with each value replaced by the previous (or further backward) value in the array.
Parameters
Name | Type | Description | Default |
---|---|---|---|
x |
a pandas Series object | required | |
n |
number of next values backward to replace each value with | 1 |
|
default |
what to replace the n final values of the array with | None |
Examples
>>> lag(pd.Series([1,2,3]), n=1)
0 NaN
1 1.0
2 2.0
dtype: float64
>>> lag(pd.Series([1,2,3]), n=1, default = 99)
0 99.0
1 1.0
2 2.0
dtype: float64
last
dply.vector.last(x, order_by=None, default=None)
lead
dply.vector.lead(x, n=1, default=None)
Return an array with each value replaced by the next (or further forward) value in the array.
Parameters
Name | Type | Description | Default |
---|---|---|---|
x |
a pandas Series object | required | |
n |
number of next values forward to replace each value with | 1 |
|
default |
what to replace the n final values of the array with | None |
Examples
>>> lead(pd.Series([1,2,3]), n=1)
0 2.0
1 3.0
2 NaN
dtype: float64
>>> lead(pd.Series([1,2,3]), n=1, default = 99)
0 2
1 3
2 99
dtype: int64
min_rank
dply.vector.min_rank(x, na_option='keep')
Return the min rank. See pd.Series.rank with method=“min” for details.
n
dply.vector.n(x)
Return the total number of elements in the array (or rows in a DataFrame).
Examples
>>> ser = pd.Series([1,2,3])
>>> n(ser)
3
>>> df = pd.DataFrame({'x': ser})
>>> n(df)
3
n_distinct
dply.vector.n_distinct(x)
Return the total number of distinct (i.e. unique) elements in an array.
Examples
>>> n_distinct(pd.Series([1,1,2,2]))
2
na_if
dply.vector.na_if(x, y)
Return a array like x, but with values in y replaced by NAs.
Examples
>>> na_if(pd.Series([1,2,3]), [1,3])
0 NaN
1 2.0
2 NaN
dtype: float64
near
dply.vector.near(x)
TODO: Not Implemented
nth
dply.vector.nth(x, n, order_by=None, default=None)
Return the nth entry of x. Similar to x[n]
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
x |
series to get entry from. | required | |
n |
position of entry to get from x (0 indicates first entry). | required | |
order_by |
optional Series used to reorder x. | None |
|
default |
(not implemented) value to return if no entry at n. | None |
Notes
first(x) and last(x) are nth(x, 0) and nth(x, -1).
Examples
>>> ser = pd.Series(['a', 'b', 'c'])
>>> nth(ser, 1)
'b'
>>> sorter = pd.Series([1, 2, 0])
>>> nth(ser, 1, order_by = sorter)
'a'
>>> nth(ser, 0), nth(ser, -1)
'a', 'c') (
>>> first(ser), last(ser)
'a', 'c') (
ntile
dply.vector.ntile(x, n)
TODO: Not Implemented
percent_rank
dply.vector.percent_rank(x, na_option='keep')
Notes
Uses minimum rank, and reports the proportion of unique ranks each entry is greater than.
Examples
>>> percent_rank(pd.Series([1, 2, 3]))
0 0.0
1 0.5
2 1.0
dtype: float64
>>> percent_rank(pd.Series([1, 2, 2]))
0 0.0
1 0.5
2 0.5
dtype: float64
>>> percent_rank(pd.Series([1]))
0 NaN
dtype: float64
row_number
dply.vector.row_number(x)
Return the row number (position) for each value in x, beginning with 1.
Examples
>>> ser = pd.Series([7,8])
>>> row_number(ser)
0 1
1 2
dtype: int64
>>> row_number(pd.DataFrame({'a': ser}))
0 1
1 2
dtype: int64
>>> row_number(pd.Series([7,8], index = [3, 4]))
3 1
4 2
dtype: int64