Title: | Time-Weighted Dynamic Time Warping |
---|---|
Description: | Implements Time-Weighted Dynamic Time Warping (TWDTW), a measure for quantifying time series similarity. The TWDTW algorithm, described in Maus et al. (2016) <doi:10.1109/JSTARS.2016.2517118> and Maus et al. (2019) <doi:10.18637/jss.v088.i05>, is applicable to multi-dimensional time series of various resolutions. It is particularly suitable for comparing time series with seasonality for environmental and ecological data analysis, covering domains such as remote sensing imagery, climate data, hydrology, and animal movement. The 'twdtw' package offers a user-friendly 'R' interface, efficient 'Fortran' routines for TWDTW calculations, flexible time weighting definitions, as well as utilities for time series preprocessing and visualization. |
Authors: | Victor Maus [aut, cre] |
Maintainer: | Victor Maus <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0-2 |
Built: | 2024-10-25 04:47:06 UTC |
Source: | https://github.com/vwmaus/twdtw |
This function takes a date or datetime and converts it to a numeric cycle. The cycle can be specified in units of years, months, days, hours, minutes, or seconds. When cycle_length is a string, time_scale only changes the unit in which the result is expressed. When cycle_length is numeric, time_scale and origin are used to compute the elapsed time.
date_to_numeric_cycle(x, cycle_length, time_scale, origin = NULL)
date_to_numeric_cycle(x, cycle_length, time_scale, origin = NULL)
x |
A vector of dates or datetimes to convert. If not of type Date or POSIXct, the function attempts to convert it. |
cycle_length |
The length of the cycle. Can be a numeric value or a string specifying the units ('year', 'month', 'day', 'hour', 'minute', 'second'). When numeric, the cycle length is in the same units as time_scale. When a string, it specifies the time unit of the cycle. |
time_scale |
Specifies the time scale for the conversion. Must be one of 'year', 'month', 'day', 'hour', 'minute', 'second'. When cycle_length is a string, time_scale changes the unit in which the result is expressed. When cycle_length is numeric, time_scale is used to compute the elapsed time in seconds. |
origin |
For numeric cycle_length, the origin must be specified. This is the point from which the elapsed time is computed. Must be of the same class as x. |
The numeric cycle value(s) corresponding to x.
date_to_numeric_cycle(Sys.time(), "year", "day") # Returns the day of the year date_to_numeric_cycle(Sys.time(), "day", "hour") # Returns the hour of the day
date_to_numeric_cycle(Sys.time(), "year", "day") # Returns the day of the year date_to_numeric_cycle(Sys.time(), "day", "hour") # Returns the hour of the day
This function returns the maximum possible value that a specific time component can take, given a cycle length and scale.
max_cycle_length(cycle_length, time_scale)
max_cycle_length(cycle_length, time_scale)
cycle_length |
A character string indicating the larger unit of time. It must be one of "year", "month", "day", "hour", "minute". |
time_scale |
A character string indicating the smaller unit of time,
which is a division of the |
The function returns the maximum possible value that the time_scale
can take within one cycle_length
.
max_cycle_length("year", "month") # Maximum months is a year 12 max_cycle_length("day", "minute") # Maximum minutes in a day 1440 max_cycle_length("year", "day") # Maximum days in a year 366
max_cycle_length("year", "month") # Maximum months is a year 12 max_cycle_length("day", "minute") # Maximum minutes in a day 1440 max_cycle_length("year", "day") # Maximum days in a year 366
This function visualizes the Time-Weighted Dynamic Time Warping cost matrix.
plot_cost_matrix(x, ...)
plot_cost_matrix(x, ...)
x |
An object of class 'twdtw' including internal data. |
... |
Additional arguments passed to |
An image plot of the TWDTW cost matrix. The x-axis represents the time series x, and the y-axis represents the time series y. The cost matrix is color-coded, with darker shades indicating higher costs and lighter shades indicating lower costs. No object is returned by this function; the plot is directly outputted to the active device.
# Create a time series n <- 23 t <- seq(0, pi, length.out = n) d <- seq(as.Date('2020-09-01'), length.out = n, by = "15 day") x <- data.frame(time = d, v1 = sin(t)*2 + runif(n)) # shift time by 30 days y <- data.frame(time = d + 30, v1 = sin(t)*2 + runif(n)) plot(x, type = "l", xlim = range(c(d, d + 5))) lines(y, col = "red") # Call twdtw using "output = 'internals' twdtw_obj <- twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50), output = 'internals') plot_cost_matrix(twdtw_obj)
# Create a time series n <- 23 t <- seq(0, pi, length.out = n) d <- seq(as.Date('2020-09-01'), length.out = n, by = "15 day") x <- data.frame(time = d, v1 = sin(t)*2 + runif(n)) # shift time by 30 days y <- data.frame(time = d + 30, v1 = sin(t)*2 + runif(n)) plot(x, type = "l", xlim = range(c(d, d + 5))) lines(y, col = "red") # Call twdtw using "output = 'internals' twdtw_obj <- twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50), output = 'internals') plot_cost_matrix(twdtw_obj)
Print method for twdtw class
## S3 method for class 'twdtw' print(x, ...)
## S3 method for class 'twdtw' print(x, ...)
x |
An object of class |
... |
Arguments passed to |
This function returns a textual representation of the object twdtw
, which is printed directly to the console.
If x
is a list, the function will print a summary of matches and omit twdtw
's internal data, see names(x)
.
If x
is not a list, it prints the content of x
, i.e. either a matrix with all matches or the lowest twdtw
distance.
This function converts a vector of date-like strings to Date or POSIXct format, depending on the format of the input strings. It checks if the input is already in Date or POSIXct format and performs the conversion accordingly.
to_date_time(x)
to_date_time(x)
x |
A vector of strings or objects to be converted. |
A vector of dates or datetimes.
dates <- c("2023-07-15", "2023-07-16") datetimes <- c("2023-07-15 12:30:00", "2023-07-16 13:45:00") to_date_time(dates) to_date_time(datetimes)
dates <- c("2023-07-15", "2023-07-16") datetimes <- c("2023-07-15 12:30:00", "2023-07-16 13:45:00") to_date_time(dates) to_date_time(datetimes)
This function calculates the Time-Weighted Dynamic Time Warping (TWDTW) distance between two time series.
twdtw(x, y, time_weight, cycle_length, time_scale, ...) ## S3 method for class 'data.frame' twdtw( x, y, time_weight, cycle_length, time_scale, origin = NULL, index_column = "time", max_elapsed = Inf, output = "distance", version = "f90", ... ) ## S3 method for class 'matrix' twdtw( x, y, time_weight, cycle_length, time_scale = NULL, index_column = 1, max_elapsed = Inf, output = "distance", version = "f90", ... )
twdtw(x, y, time_weight, cycle_length, time_scale, ...) ## S3 method for class 'data.frame' twdtw( x, y, time_weight, cycle_length, time_scale, origin = NULL, index_column = "time", max_elapsed = Inf, output = "distance", version = "f90", ... ) ## S3 method for class 'matrix' twdtw( x, y, time_weight, cycle_length, time_scale = NULL, index_column = 1, max_elapsed = Inf, output = "distance", version = "f90", ... )
x |
A data.frame or matrix representing time series. |
y |
A data.frame or matrix representing a labeled time series (reference). |
time_weight |
A numeric vector with length two (steepness and midpoint of logistic weight) or a function. See details. |
cycle_length |
The length of the cycle. Can be a numeric value or a string specifying the units ('year', 'month', 'day', 'hour', 'minute', 'second'). When numeric, the cycle length is in the same units as time_scale. When a string, it specifies the time unit of the cycle. |
time_scale |
Specifies the time scale for the conversion. Must be one of 'year', 'month', 'day', 'hour', 'minute', 'second'. When cycle_length is a string, time_scale changes the unit in which the result is expressed. When cycle_length is numeric, time_scale is used to compute the elapsed time in seconds. |
... |
ignore |
origin |
For numeric cycle_length, the origin must be specified. This is the point from which the elapsed time is computed. Must be of the same class as x. |
index_column |
(optional) The column name of the time index for data.frame inputs. Defaults to "time". For matrix input, an integer indicating the column with the time index. Defaults to 1. |
max_elapsed |
Numeric value constraining the TWDTW calculation to the lower band given by a maximum elapsed time. Defaults to Inf. |
output |
A character string defining the output. It must be one of 'distance', 'matches', 'internals'. Defaults to 'distance'.
'distance' will return the lowest TWDTW distance between |
version |
A string identifying the version of TWDTW implementation. Options are 'f90' for Fortran 90, 'f90goto' for Fortran 90 with goto statements, or 'cpp' for C++ version. Defaults to 'f90'. See details. |
TWDTW calculates a time-weighted version of DTW by modifying each element of the
DTW's local cost matrix (see details in Maus et al. (2016) and Maus et al. (2019)).
The default time weight is calculated using a logistic function
that adds a weight to each pair of observations in the time series x
and y
based on the time difference between observations, such that
Where:
is the time-weight function
is the Euclidean distance between the i-th element of
x
and the j-th element of y
in a multi-dimensional space
is the time elapsed between the i-th element of
x
and the j-th element of y
and
are the steepness and midpoint of the logistic function, respectively
The logistic function is implemented as the default option in the C++ and Fortran versions of the code.
To use the native implementation, and
must be provided as a numeric vector of
length two using the argument
time_weight
. This implementation provides high processing performance.
The time_weight
argument also accepts a function defined in R, allowing the user to define a different
weighting scheme. However, passing a function to time_weight
can degrade the processing performance,
i.e., it can be up to 3x slower than using the default logistic time-weight.
A time-weight function passed to time_weight
must receive two numeric arguments and return a
single numeric value. The first argument received is the Euclidean and the second
is the elapsed time
. For example,
time_weight = function(dist, el) dist + 0.1*el
defines a linear weighting scheme with a slope of 0.1.
The Fortran 90 versions of twdtw
are usually faster than the C++ version.
The 'f90goto
' version, which uses goto statements, is slightly quicker than the
'f90
' version that uses while and for loops. You can use the max_elapsed
parameter
to limit the TWDTW calculation to a maximum elapsed time. This means it will skip
comparisons between pairs of observations in x
and y
that are far apart in time.
Be careful, though: if max_elapsed
is set too low, it could change the results.
It important to try out different settings for your specific problem.
An S3 object twdtw either: If output = 'distance', a numeric value representing the TWDTW distance between the two time series. If output = 'matches', a numeric matrix of all TWDTW matches. For each match the starting index, ending index, and distance are returned. If output = 'internals', a list of all TWDTW internal data is returned.
Maus, V., Camara, G., Cartaxo, R., Sanchez, A., Ramos, F. M., & de Moura, Y. M. (2016). A Time-Weighted Dynamic Time Warping Method for Land-Use and Land-Cover Mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(8), 3729-3739. doi:10.1109/JSTARS.2016.2517118
Maus, V., Camara, G., Appel, M., & Pebesma, E. (2019). dtwSat: Time-Weighted Dynamic Time Warping for Satellite Image Time Series Analysis in R. Journal of Statistical Software, 88(5), 1-31. doi:10.18637/jss.v088.i05
# Create a time series n <- 23 t <- seq(0, pi, length.out = n) d <- seq(as.Date('2020-09-01'), length.out = n, by = "15 day") x <- data.frame(time = d, v1 = sin(t)*2 + runif(n)) # shift time by 30 days y <- data.frame(time = d + 30, v1 = sin(t)*2 + runif(n)) plot(x, type = "l", xlim = range(c(d, d + 5))) lines(y, col = "red") # Calculate TWDTW distance between x and y using logistic weight twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50)) # Pass a generic time-weight function twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = function(x,y) x + 1.0 / (1.0 + exp(-0.1 * (y - 50)))) # Test other version twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50), version = 'f90goto') twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50), version = 'cpp')
# Create a time series n <- 23 t <- seq(0, pi, length.out = n) d <- seq(as.Date('2020-09-01'), length.out = n, by = "15 day") x <- data.frame(time = d, v1 = sin(t)*2 + runif(n)) # shift time by 30 days y <- data.frame(time = d + 30, v1 = sin(t)*2 + runif(n)) plot(x, type = "l", xlim = range(c(d, d + 5))) lines(y, col = "red") # Calculate TWDTW distance between x and y using logistic weight twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50)) # Pass a generic time-weight function twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = function(x,y) x + 1.0 / (1.0 + exp(-0.1 * (y - 50)))) # Test other version twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50), version = 'f90goto') twdtw(x, y, cycle_length = 'year', time_scale = 'day', time_weight = c(steepness = 0.1, midpoint = 50), version = 'cpp')