R to Python
String methods vs stringr
Pandas allows you to slice all strings in a Series, but does not allow you to apply custom slices to each string (a la
stringr::str_sub). This means there is no easy equivalent to using results fromstringr::str_locateto subset strings.While most Pandas string methods are under the
.straccessor, the ones for ordering are not. Tostringr::str_order()andstringr::str_sort(), use.argsort()and.sort_values().stringr has an
*_all()variant on several functions (e.g.str_replace,str_locate,str_extract,str_match). Pandas generally has equivalent behavior, but it is sometimes specified by using an alternative method (e.g.str.extractall()), and sometimes by using an argument (e.g.str_replace(..., n = 1)).Pandas string methods are modeled after python
strobject methods AND stringr (This is mentioned in the.straccessor source code). However, itβs not always clear what accepts a regex (similar to stringr) and what does not (similr tostrobject methods).For example,
.str.count()only accepts a regex.str.startswith()does not. Other methods likestr.contains()accept a regex by default, but this can be disabled using the regex argument.This is not a big issue in practice, but warrants some caution / teaching strategy.