R to Python
String methods vs stringr
Pandas allows you to slice all strings in a Series, but does not allow you to apply custom slices to each string (a la
stringr::str_sub
). This means there is no easy equivalent to using results fromstringr::str_locate
to subset strings.While most Pandas string methods are under the
.str
accessor, the ones for ordering are not. Tostringr::str_order()
andstringr::str_sort()
, use.argsort()
and.sort_values()
.stringr has an
*_all()
variant on several functions (e.g.str_replace
,str_locate
,str_extract
,str_match
). Pandas generally has equivalent behavior, but it is sometimes specified by using an alternative method (e.g.str.extractall()
), and sometimes by using an argument (e.g.str_replace(..., n = 1)
).Pandas string methods are modeled after python
str
object methods AND stringr (This is mentioned in the.str
accessor source code). However, itβs not always clear what accepts a regex (similar to stringr) and what does not (similr tostr
object methods).For example,
.str.count()
only accepts a regex.str.startswith()
does not. Other methods likestr.contains()
accept a regex by default, but this can be disabled using the regex argument.This is not a big issue in practice, but warrants some caution / teaching strategy.