Week 1 solutions for HWDSC
To refresh pandas
knowledge. To do several groupby
s and join
s to solve the task.
|
|
0. Print the shape of the loaded dataframes and use df.head function to print several rows. Examine the features you are given.
|
|
1. What was the maximum total revenue among all the shops in September, 2014?
- Hereinafter revenue refers to total sales minus value of goods returned.
Hints:
- Sometimes items are returned, find such examples in the dataset.
- It is handy to split
date
field into [day
,month
,year
] components and usedf.year == 14
anddf.month == 9
in order to select target subset of dates. - You may work with
date
feature as with srings, or you may first convert it topd.datetime
type withpd.to_datetime
function, but do not forget to set correctformat
argument.
|
|
2. What item category generated the highest revenue in summer 2014?
Submit
id
of the category found.Here we call “summer” the period from June to August.
Hints:
- Note, that for an object
x
of typepd.Series
:x.argmax()
returns index of the maximum element.pd.Series
can have non-trivial index (not[1, 2, 3, ... ]
).
|
|
3. How many items are there, such that their price stays constant (to the best of our knowledge) during the whole period of time?
- Let’s assume, that the items are returned for the same price as they had been sold.
|
|
4. What was the variance of the number of sold items per day sequence for the shop with shop_id = 25 in December, 2014? Do not count the items, that were sold but returned back later.
- Fill
total_num_items_sold
anddays
arrays, and plot the sequence with the code below. - Then compute variance. Remember, there can be differences in how you normalize variance (biased or unbiased estimate, see link). Compute unbiased estimate (use the right value for
ddof
argument inpd.var
ornp.var
).
|
|