I have a data.frame:
df <- structure(list(id = 1:3, vars = list("a", c("a", "b", "c"), c("b",
"c"))), .Names = c("id", "vars"), row.names = c(NA, -3L), class = "data.frame")
with a list column (each with a character vector):
> str(df)
'data.frame': 3 obs. of 2 variables:
$ id : int 1 2 3
$ vars:List of 3
..$ : chr "a"
..$ : chr "a" "b" "c"
..$ : chr "b" "c"
I want to filter the data.frame according to setdiff(vars,remove_this)
library(dplyr)
library(tidyr)
res <- df %>% mutate(vars = lapply(df$vars, setdiff, "a"))
which gets me this:
> res
id vars
1 1
2 2 b, c
3 3 b, c
But to get drop the character(0)
vars I have to do something like:
res %>% unnest(vars) # and then do the equivalent of nest(vars) again after...
Actual datasets:
- 560K rows and 3800K rows that also have 10 more columns (to carry along).
(this is quite slow, which leads to question...)
What is the Fastest way to do this in R
?
- Is there a
dplyr
/data.table
/ other faster method? - How to do this with
Rcpp
?
UPDATE/EXTENSION:
can the column modification be done in place rather then by copying the
lapply(vars,setdiff(...
result?what's the most efficient way to filter out for
vars == character(0)
if it must be a seperate step.