For this particular example,
The objective is to separate these key-value pairs and store the values in corresponding key columns.
The hadleyverse packages make this task a fairly simple one, especially tidyr, stringr and magrittr.
- the variables of interest are stored as key:value pairs and
- a single data cell could contain multiple (unknown) number of key:value pairs.
The objective is to separate these key-value pairs and store the values in corresponding key columns.
The hadleyverse packages make this task a fairly simple one, especially tidyr, stringr and magrittr.
This comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThanks steadyfish for this, I've been looking for something like this. How would you handle data that has e.g. [ 7 breads, 5 pens, 10 eggs] as entry for products column and quantity in one cell under a particular shop. Then you'll need to use 'mutate' to create a revenue column e.g. Mutate(mydata, revenue=total product* price).
ReplyDeleteAssuming you have a thousands shop each reporting different products.
I am confused as to how to clean such messy data
Very nice post. I absolutely appreciate this site. Continue the good work!
ReplyDeletedata cleaning techniques