How to do a row based process using Deedle (Frame in and Frame out)
I am trying to use Deedle to do row based process on a DataFrame. But i just can't get my mind tuned into the Deedle way.
Say for a Frame like
Indicator1 Indicator2
1 100 200
2 300 500
3 -200 1000
Say there are some rules needs to be applied to each indicator:
- if Indicator value is less than 500 and larger than 0, multiply it by 1.1
- if Indicator value is less than 0, make it NaN
Я пытался использовать Frame.mapRow
.... functions.
Я знаю, что могу использовать
fun v -> let indVal = v.GetAs<Int>("Indicator1");
let newIndVal = match indVal with
|...... logic
|...... some other logic
let indVal2 = v.GetAs<Int>("Indicator2");
let newIndVal2 = match indVal2 with
|...... logic
|...... some other logic
с Frame.mapRow
....
But I am stuck at how to make the newIndVal
а также newIndVal2
back into the a row and eventually back into a new data frame.
What I am trying to achieve is a frame in and frame out. Also I only know to process the column one by one (after retrieving them by index or name). If the logic to be applied are generic, is there a way NOT to apply the logic one column by one column?
A imperative (and really simple) way to do this with C or C# 2d array is
loop through the row dimension
loop through the column dimension
apply the rule as the side effect to the array[row,col]
How to achieve this in Deedle?
ОБНОВИТЬ:
Leaf Garland's suggestion works great if the calculation doesn't need to reference other columns from the same row. For my case, I need to look at the data row by row, hence I would like to use Frame.mapRows. I should have been clear on the simplified requirements:
Say for a Frame like
Indicator1 Indicator2
1 100 200
2 <Missing> 500
3 -200 1000
4 100 <Missing>
5 <Missing> 500
6 -200 100
For example if indicator1 is less than 300, new Indicator2 value is Indicator2 + 5% * Indicator1
Мне нужно использовать
mapRows fun k v -> let var1 = v.get("Indicator1")
let var2 = v.get("Indicator2")
run through the conditions and produce new var1 and var2
produce a objectSeries
|> Frame.ofRows
The pesudo code above sounds simple but i just can figure out how to reproduce a proper objectSeries to recreate the Frame.
I also noticed something i can't explain with mapRows function [SO question]: /questions/5114665/deedle-framemaprows-kak-pravilno-ego-ispolzovat-i-kak-pravilno-postroit-seriyu-obektov
Обновить
Since the original question was posted, I have since used Deedle in C#. К моему удивлению, вычисления на основе строк очень просты в C#, и способ, которым функция C# Frame.rows обрабатывает пропущенные значения, сильно отличается от функции F# mapRows. Ниже приведен очень простой пример, который я использовал, чтобы попытаться проверить правильность логики. это может быть полезно любому, кто ищет подобное приложение:
На что следует обратить внимание: 1. Функция "строки" не удаляла строку, в то время как значения обоих столбцов отсутствуют. 2. Функция среднего значения достаточно умна для вычисления среднего значения на основе доступной точки данных.
using System.Text;
using System.Threading.Tasks;
using Deedle;
namespace TestDeedleRowProcessWithMissingValues
{
class Program
{
static void Main(string[] args)
{
var s1 = new SeriesBuilder<DateTime, double>(){
{DateTime.Today.Date.AddDays(-5),10.0},
{DateTime.Today.Date.AddDays(-4),9.0},
{DateTime.Today.Date.AddDays(-3),8.0},
{DateTime.Today.Date.AddDays(-2),double.NaN},
{DateTime.Today.Date.AddDays(-1),6.0},
{DateTime.Today.Date.AddDays(-0),5.0}
}.Series;
var s2 = new SeriesBuilder<DateTime, double>(){
{DateTime.Today.Date.AddDays(-5),10.0},
{DateTime.Today.Date.AddDays(-4),double.NaN},
{DateTime.Today.Date.AddDays(-3),8.0},
{DateTime.Today.Date.AddDays(-2),double.NaN},
{DateTime.Today.Date.AddDays(-1),6.0}
}.Series;
var f = Frame.FromColumns(new KeyValuePair<string, Series<DateTime, double>>[] {
KeyValue.Create("s1",s1),
KeyValue.Create("s2",s2)
});
s1.Print();
f.Print();
f.Rows.Select(kvp => kvp.Value).Print();
// 29/05/2015 12:00:00 AM -> series [ s1 => 10; s2 => 10]
// 30/05/2015 12:00:00 AM -> series [ s1 => 9; s2 => <missing>]
// 31/05/2015 12:00:00 AM -> series [ s1 => 8; s2 => 8]
// 1/06/2015 12:00:00 AM -> series [ s1 => <missing>; s2 => <missing>]
// 2/06/2015 12:00:00 AM -> series [ s1 => 6; s2 => 6]
// 3/06/2015 12:00:00 AM -> series [ s1 => 5; s2 => <missing>]
f.Rows.Select(kvp => kvp.Value.As<double>().Mean()).Print();
// 29/05/2015 12:00:00 AM -> 10
// 30/05/2015 12:00:00 AM -> 9
// 31/05/2015 12:00:00 AM -> 8
// 1/06/2015 12:00:00 AM -> <missing>
// 2/06/2015 12:00:00 AM -> 6
// 3/06/2015 12:00:00 AM -> 5
//Console.ReadLine();
}
}
}
1 ответ
Вы можете отобразить все значения в вашем кадре, используя Frame.mapValues
, Предоставьте ему функцию, которая принимает ваш тип данных и возвращает обновленное значение.
let indicator1 = [100.0;300.0;-200.0] |> Series.ofValues
let indicator2 = [200.0;500.0;1000.0] |> Series.ofValues
let frame = Frame.ofColumns ["indicator1" => indicator1; "indicator2" => indicator2]
// val frame : Frame<int,string> =
//
// indicator1 indicator2
// 0 -> 100 200
// 1 -> 300 500
// 2 -> -200 1000
let update v =
match v with
|v when v<500.0 && v>0.0 -> v * 1.1
|v when v<0.0 -> nan
|v -> v
let newFrame = frame |> Frame.mapValues update
// val newFrame : Frame<int,string> =
//
// indicator1 indicator2
// 0 -> 110 220
// 1 -> 330 500
// 2 -> <missing> 1000