Как добавить значения по строкам в сгруппированном столбце
У меня есть некоторые данные датчика со 100 вводами данных в секунду. В последнем столбце указаны миллисекунды, которые на данный момент равны 10. Как можно суммировать по миллисекундам по строкам, сгруппированные по времени и дате.
testdata <- structure(list(local_date = c("26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017", "26-06-2017"),
local_time = c("13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:23", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24", "13:58:24" ),
ms = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10)),
.Names = c("local_date", "local_time", "ms"), row.names = c(NA, -200L), class = c("data.table", "data.frame"))
Первые 100 строк имеют одинаковое время (13:58:23) и дату (26-06-2017), но все они имеют 10 миллисекунд. Результат должен содержать только одну запись с 10 миллисекундами в секунду, а следующие миллисекунды добавляются к предыдущим.
Этот фрагмент создаст результат с последовательностью:
testdata$ms = rep(seq(from = 10, to = 1000, by = 10), 2)
Но поскольку исходные данные не настолько чисты, я должен сгруппировать данные по дате и времени, а затем сложить миллисекунды по строкам.
Я бы предпочел data.table
решение, но dplyr
также будет работать нормально.
2 ответа
Похоже, вам нужно сгруппировать cumsum
:
library(dplyr)
testdata$ms2 = rep(seq(from = 10, to = 1000, by = 10), 2)
testdata %>%
group_by(local_date, local_time) %>%
mutate(cumsum_ms = cumsum(ms))
local_date local_time ms ms2 cumsum_ms
<chr> <chr> <dbl> <dbl> <dbl>
1 26-06-2017 13:58:23 10 10 10
2 26-06-2017 13:58:23 10 20 20
3 26-06-2017 13:58:23 10 30 30
4 26-06-2017 13:58:23 10 40 40
5 26-06-2017 13:58:23 10 50 50
И добавить data.table
версия:
testdata[, ms := cumsum(ms), by = .(local_time, local_date)]