posted Jul 8, 2015, 6:15 PM by Onno Benschop
Our source code is stored in the following repositories: |
posted Jul 4, 2015, 6:49 PM by Aisling Blackmore
http://codecondo.com/wp-content/uploads/2014/04/9-Free-Books-for-Learning-Data-Mining-Data-Analysis.jpg
|
posted Jul 4, 2015, 1:27 AM by Onno Benschop
[
updated Jul 5, 2015, 12:05 AM
]
- mkdir tmp ; for n in * ; do cat "$n" | tr -sc '[:alnum:]' '\n' | sort | uniq -ic | sort -rn > "./tmp/$n" ; done
- for n in ./tmp/* ; do cat "$n" | tr -s ' ' , | sed "s/^/$n/" ; done | awk -F, '{print $1,$3,$2}' OFS=, > word_list.csv
- tr -sc '[:alnum:]' '\n' | sort | uniq -ic | sort -rn | tr -s ' ' , | sed -e "s|^|${url}|"
|
posted Jul 3, 2015, 8:45 PM by Onno Benschop
- https://www.govhack.org/amazon-web-services/
|
posted Jul 3, 2015, 8:44 PM by Onno Benschop
- http://www.ni.com/newsletter/51649/en/
|
posted Jul 3, 2015, 7:25 PM by Onno Benschop
- http://dius.com.au/2014/01/07/eat-5-terabytes-lunch-hour-elastic-mapreduce/
|
posted Jul 3, 2015, 6:48 PM by Onno Benschop
[
updated Jul 3, 2015, 6:55 PM
]
To do a word count across a large data-set: - https://aws.amazon.com/articles/Elastic-MapReduce/2273
- http://hci.stanford.edu/courses/cs448g/a2/
emr awscli rtfm: - http://docs.aws.amazon.com/cli/latest/reference/emr/index.html
|
posted Jul 3, 2015, 6:45 PM by Onno Benschop
To deploy mongodb within aws to get massive parallel performance and storage I read these documents: - https://d0.awsstatic.com/whitepapers/AWS_NoSQL_MongoDB.pdf
- https://s3.amazonaws.com/quickstart-reference/mongodb/latest/doc/MongoDB_on_the_AWS_Cloud.pdf
|
posted Jul 3, 2015, 6:43 PM by Onno Benschop
To use elastic map reduce on debian, - aptitude install python-pip
- pip install awscli
|
|