I have the following dataframe in Spark (using PySpark):
DT_BORD_REF
: Timestamp column,
COUNTRY_ALPHA
: Country Alpha-3 code,
working_day_flag
: if the date is a working day in that country or not
I need to add two fields:
- count of working days from the beginning of the month for that country (month to date)
- count of working days remaining until the end of that month for that country (month to go)
It seems it's an application of a window function, but I can't figure out
+-------------------+-------------+----------------+
| DT_BORD_REF|COUNTRY_ALPHA|working_day_flag|
+-------------------+-------------+----------------+
|2021-01-01 00:00:00| FRA| N|
|2021-01-01 00:00:00| ITA| N|
|2021-01-01 00:00:00| BRA| N|
|2021-01-02 00:00:00| BRA| N|
|2021-01-02 00:00:00| FRA| N|
|2021-01-02 00:00:00| ITA| N|
|2021-01-03 00:00:00| ITA| N|
|2021-01-03 00:00:00| BRA| N|
|2021-01-03 00:00:00| FRA| N|
|2021-01-04 00:00:00| BRA| Y|
|2021-01-04 00:00:00| FRA| Y|
|2021-01-04 00:00:00| ITA| Y|
|2021-01-05 00:00:00| FRA| Y|
|2021-01-05 00:00:00| BRA| Y|
|2021-01-05 00:00:00| ITA| Y|
|2021-01-06 00:00:00| ITA| N|
|2021-01-06 00:00:00| FRA| Y|
|2021-01-06 00:00:00| BRA| Y|
|2021-01-07 00:00:00| ITA| Y|
+-------------------+-------------+----------------+