dply.vector

dply.vector

Functions that implement dplyr vector operations.

Functions

Name Description
between Return whether a value is between left and right (including either side).
coalesce Returns a copy of x, with NaN values filled in from *args. Ignores indexes.
cumall Return a same-length array. For each entry, indicates whether that entry and all previous are True-like.
cumany Return a same-length array. For each entry, indicates whether that entry or any previous are True-like.
cume_dist Return the cumulative distribution corresponding to each value in x.
cummean Return a same-length array, containing the cumulative mean.
dense_rank Return the dense rank.
desc Return array sorted in descending order.
first
lag Return an array with each value replaced by the previous (or further backward) value in the array.
last
lead Return an array with each value replaced by the next (or further forward) value in the array.
min_rank Return the min rank. See pd.Series.rank with method=“min” for details.
n Return the total number of elements in the array (or rows in a DataFrame).
n_distinct Return the total number of distinct (i.e. unique) elements in an array.
na_if Return a array like x, but with values in y replaced by NAs.
near TODO: Not Implemented
nth Return the nth entry of x. Similar to x\[n\].
ntile TODO: Not Implemented
percent_rank Return the percent rank.
row_number Return the row number (position) for each value in x, beginning with 1.

between

dply.vector.between(x, left, right, default=False)

Notes

This is a thin wrapper around pd.Series.between(left, right)

Examples

>>> between(pd.Series([1,2,3]), 0, 2)
0     True
1     True
2    False
dtype: bool

coalesce

dply.vector.coalesce(x, *args)

Returns a copy of x, with NaN values filled in from *args. Ignores indexes.

Arguments: x: a pandas Series object *args: other Series that are the same length as x, or a scalar

Examples: >>> x = pd.Series([1.1, None, None]) >>> abc = pd.Series([‘a’, ‘b’, None]) >>> xyz = pd.Series([‘x’, ‘y’, ‘z’]) >>> coalesce(x, abc) 0 1.1 1 b 2 None dtype: object

>>> coalesce(x, abc, xyz)
0    1.1
1      b
2      z
dtype: object

cumall

dply.vector.cumall(x)

Return a same-length array. For each entry, indicates whether that entry and all previous are True-like.

Examples

>>> cumall(pd.Series([True, False, False]))
0     True
1    False
2    False
dtype: bool

cumany

dply.vector.cumany(x)

Return a same-length array. For each entry, indicates whether that entry or any previous are True-like.

Examples

>>> cumany(pd.Series([False, True, False]))
0    False
1     True
2     True
dtype: bool

cume_dist

dply.vector.cume_dist(x, na_option='keep')

Return the cumulative distribution corresponding to each value in x.

This reflects the proportion of values that are less than or equal to each value.

cummean

dply.vector.cummean(x)

Return a same-length array, containing the cumulative mean.

dense_rank

dply.vector.dense_rank(x, na_option='keep')

Return the dense rank.

This method of ranking returns values ranging from 1 to the number of unique entries. Ties are all given the same ranking.

Examples

>>> dense_rank(pd.Series([1,3,3,5]))
0    1.0
1    2.0
2    2.0
3    3.0
dtype: float64

desc

dply.vector.desc(x)

Return array sorted in descending order.

first

dply.vector.first(x, order_by=None, default=None)

lag

dply.vector.lag(x, n=1, default=None)

Return an array with each value replaced by the previous (or further backward) value in the array.

Parameters

Name Type Description Default
x a pandas Series object required
n number of next values backward to replace each value with 1
default what to replace the n final values of the array with None

Examples

>>> lag(pd.Series([1,2,3]), n=1)
0    NaN
1    1.0
2    2.0
dtype: float64
>>> lag(pd.Series([1,2,3]), n=1, default = 99)
0    99.0
1     1.0
2     2.0
dtype: float64

last

dply.vector.last(x, order_by=None, default=None)

lead

dply.vector.lead(x, n=1, default=None)

Return an array with each value replaced by the next (or further forward) value in the array.

Parameters

Name Type Description Default
x a pandas Series object required
n number of next values forward to replace each value with 1
default what to replace the n final values of the array with None

Examples

>>> lead(pd.Series([1,2,3]), n=1)
0    2.0
1    3.0
2    NaN
dtype: float64
>>> lead(pd.Series([1,2,3]), n=1, default = 99)
0     2
1     3
2    99
dtype: int64

min_rank

dply.vector.min_rank(x, na_option='keep')

Return the min rank. See pd.Series.rank with method=“min” for details.

n

dply.vector.n(x)

Return the total number of elements in the array (or rows in a DataFrame).

Examples

>>> ser = pd.Series([1,2,3])
>>> n(ser)
3
>>> df = pd.DataFrame({'x': ser})
>>> n(df)
3

n_distinct

dply.vector.n_distinct(x)

Return the total number of distinct (i.e. unique) elements in an array.

Examples

>>> n_distinct(pd.Series([1,1,2,2]))
2

na_if

dply.vector.na_if(x, y)

Return a array like x, but with values in y replaced by NAs.

Examples

>>> na_if(pd.Series([1,2,3]), [1,3])
0    NaN
1    2.0
2    NaN
dtype: float64

near

dply.vector.near(x)

TODO: Not Implemented

nth

dply.vector.nth(x, n, order_by=None, default=None)

Return the nth entry of x. Similar to x[n].

Parameters

Name Type Description Default
x series to get entry from. required
n position of entry to get from x (0 indicates first entry). required
order_by optional Series used to reorder x. None
default (not implemented) value to return if no entry at n. None

Notes

first(x) and last(x) are nth(x, 0) and nth(x, -1).

Examples

>>> ser = pd.Series(['a', 'b', 'c'])
>>> nth(ser, 1)
'b'
>>> sorter = pd.Series([1, 2, 0])
>>> nth(ser, 1, order_by = sorter)
'a'
>>> nth(ser, 0), nth(ser, -1)
('a', 'c')
>>> first(ser), last(ser)
('a', 'c')

ntile

dply.vector.ntile(x, n)

TODO: Not Implemented

percent_rank

dply.vector.percent_rank(x, na_option='keep')

Notes

Uses minimum rank, and reports the proportion of unique ranks each entry is greater than.

Examples

>>> percent_rank(pd.Series([1, 2, 3]))
0    0.0
1    0.5
2    1.0
dtype: float64
>>> percent_rank(pd.Series([1, 2, 2]))
0    0.0
1    0.5
2    0.5
dtype: float64
>>> percent_rank(pd.Series([1]))
0   NaN
dtype: float64

row_number

dply.vector.row_number(x)

Return the row number (position) for each value in x, beginning with 1.

Examples

>>> ser = pd.Series([7,8])
>>> row_number(ser)
0    1
1    2
dtype: int64
>>> row_number(pd.DataFrame({'a': ser}))
0    1
1    2
dtype: int64
>>> row_number(pd.Series([7,8], index = [3, 4]))
3    1
4    2
dtype: int64