dply.vector

dply.vector

Functions that implement dplyr vector operations.

Functions

Name	Description
between	Return whether a value is between left and right (including either side).
coalesce	Returns a copy of x, with NaN values filled in from *args. Ignores indexes.
cumall	Return a same-length array. For each entry, indicates whether that entry and all previous are True-like.
cumany	Return a same-length array. For each entry, indicates whether that entry or any previous are True-like.
cume_dist	Return the cumulative distribution corresponding to each value in x.
cummean	Return a same-length array, containing the cumulative mean.
dense_rank	Return the dense rank.
desc	Return array sorted in descending order.
first
lag	Return an array with each value replaced by the previous (or further backward) value in the array.
last
lead	Return an array with each value replaced by the next (or further forward) value in the array.
min_rank	Return the min rank. See pd.Series.rank with method=“min” for details.
n	Return the total number of elements in the array (or rows in a DataFrame).
n_distinct	Return the total number of distinct (i.e. unique) elements in an array.
na_if	Return a array like x, but with values in y replaced by NAs.
near	TODO: Not Implemented
nth	Return the nth entry of x. Similar to `x\[n\]`.
ntile	TODO: Not Implemented
percent_rank	Return the percent rank.
row_number	Return the row number (position) for each value in x, beginning with 1.

between

dply.vector.between(x, left, right, default=False)

Notes

This is a thin wrapper around pd.Series.between(left, right)

Examples

>>> between(pd.Series([1,2,3]), 0, 2)
0     True
1     True
2    False
dtype: bool

coalesce

dply.vector.coalesce(x, *args)

Returns a copy of x, with NaN values filled in from *args. Ignores indexes.

Arguments: x: a pandas Series object *args: other Series that are the same length as x, or a scalar

Examples: >>> x = pd.Series([1.1, None, None]) >>> abc = pd.Series([‘a’, ‘b’, None]) >>> xyz = pd.Series([‘x’, ‘y’, ‘z’]) >>> coalesce(x, abc) 0 1.1 1 b 2 None dtype: object

>>> coalesce(x, abc, xyz)
0    1.1
1      b
2      z
dtype: object

cumall

dply.vector.cumall(x)

Return a same-length array. For each entry, indicates whether that entry and all previous are True-like.

Examples

>>> cumall(pd.Series([True, False, False]))
0     True
1    False
2    False
dtype: bool

cumany

dply.vector.cumany(x)

Return a same-length array. For each entry, indicates whether that entry or any previous are True-like.

Examples

>>> cumany(pd.Series([False, True, False]))
0    False
1     True
2     True
dtype: bool

cume_dist

dply.vector.cume_dist(x, na_option='keep')

Return the cumulative distribution corresponding to each value in x.

This reflects the proportion of values that are less than or equal to each value.

cummean

dply.vector.cummean(x)

Return a same-length array, containing the cumulative mean.

dense_rank

dply.vector.dense_rank(x, na_option='keep')

Return the dense rank.

This method of ranking returns values ranging from 1 to the number of unique entries. Ties are all given the same ranking.

Examples

>>> dense_rank(pd.Series([1,3,3,5]))
0    1.0
1    2.0
2    2.0
3    3.0
dtype: float64

desc

dply.vector.desc(x)

Return array sorted in descending order.

first

dply.vector.first(x, order_by=None, default=None)

lag

dply.vector.lag(x, n=1, default=None)

Return an array with each value replaced by the previous (or further backward) value in the array.

Parameters

Name	Description	Default
`x`	a pandas Series object	required
`n`	number of next values backward to replace each value with	`1`
`default`	what to replace the n final values of the array with	`None`

Examples

>>> lag(pd.Series([1,2,3]), n=1)
0    NaN
1    1.0
2    2.0
dtype: float64

>>> lag(pd.Series([1,2,3]), n=1, default = 99)
0    99.0
1     1.0
2     2.0
dtype: float64

last

dply.vector.last(x, order_by=None, default=None)

lead

dply.vector.lead(x, n=1, default=None)

Return an array with each value replaced by the next (or further forward) value in the array.

Parameters

Name	Description	Default
`x`	a pandas Series object	required
`n`	number of next values forward to replace each value with	`1`
`default`	what to replace the n final values of the array with	`None`

Examples

>>> lead(pd.Series([1,2,3]), n=1)
0    2.0
1    3.0
2    NaN
dtype: float64

>>> lead(pd.Series([1,2,3]), n=1, default = 99)
0     2
1     3
2    99
dtype: int64

min_rank

dply.vector.min_rank(x, na_option='keep')

Return the min rank. See pd.Series.rank with method=“min” for details.

n

dply.vector.n(x)

Return the total number of elements in the array (or rows in a DataFrame).

Examples

>>> ser = pd.Series([1,2,3])
>>> n(ser)
3

>>> df = pd.DataFrame({'x': ser})
>>> n(df)
3

n_distinct

dply.vector.n_distinct(x)

Return the total number of distinct (i.e. unique) elements in an array.

Examples

>>> n_distinct(pd.Series([1,1,2,2]))
2

na_if

dply.vector.na_if(x, y)

Return a array like x, but with values in y replaced by NAs.

Examples

>>> na_if(pd.Series([1,2,3]), [1,3])
0    NaN
1    2.0
2    NaN
dtype: float64

near

dply.vector.near(x)

TODO: Not Implemented

nth

dply.vector.nth(x, n, order_by=None, default=None)

Return the nth entry of x. Similar to x[n].

Parameters

Name	Description	Default
`x`	series to get entry from.	required
`n`	position of entry to get from x (0 indicates first entry).	required
`order_by`	optional Series used to reorder x.	`None`
`default`	(not implemented) value to return if no entry at n.	`None`

Notes

first(x) and last(x) are nth(x, 0) and nth(x, -1).

Examples

>>> ser = pd.Series(['a', 'b', 'c'])
>>> nth(ser, 1)
'b'

>>> sorter = pd.Series([1, 2, 0])
>>> nth(ser, 1, order_by = sorter)
'a'

>>> nth(ser, 0), nth(ser, -1)
('a', 'c')

>>> first(ser), last(ser)
('a', 'c')

ntile

dply.vector.ntile(x, n)

TODO: Not Implemented

percent_rank

dply.vector.percent_rank(x, na_option='keep')

Notes

Uses minimum rank, and reports the proportion of unique ranks each entry is greater than.

Examples

>>> percent_rank(pd.Series([1, 2, 3]))
0    0.0
1    0.5
2    1.0
dtype: float64

>>> percent_rank(pd.Series([1, 2, 2]))
0    0.0
1    0.5
2    0.5
dtype: float64

>>> percent_rank(pd.Series([1]))
0   NaN
dtype: float64

row_number

dply.vector.row_number(x)

Return the row number (position) for each value in x, beginning with 1.

Examples

>>> ser = pd.Series([7,8])
>>> row_number(ser)
0    1
1    2
dtype: int64

>>> row_number(pd.DataFrame({'a': ser}))
0    1
1    2
dtype: int64

>>> row_number(pd.Series([7,8], index = [3, 4]))
3    1
4    2
dtype: int64