Add guide for R#38
Conversation
|
|
||
| If requests mostly involve compute-heavy operations (e.g. matrix multiplications, as opposed to fetching data from online databases), it is recommended to limit the number of parallel requests to number of threads or to number of physical cores in the machine, as otherwise requests will compete for resources and this will cause slowdowns and decreased throughput. Likewise, If using [Kubernetes](https://kubernetes.io) (also known as 'k8s'), avoid allocating less than a full CPU core to a compute-heavy pod, and avoid fractional core allocations. | ||
|
|
||
| ## Data frame operations |
There was a problem hiding this comment.
For this section, is it possible to provide a summary/compact recommendation table/decision table (example - data size, operation type, memory constraints) that will help answer "which one should I pick?"
There was a problem hiding this comment.
Added a table based on operation type. For data size, it'd be quite hard to make recommendations like that, because it depends a lot more on what operations are done with that data.
|
|
||
| Those sparse objects will be accepted as input by many modeling-related packages, such as `glmnet`, `xgboost`, `ranger`, `rsparse` and others, which have routines to operate efficiently on them. | ||
|
|
||
| As a general rule, sparse representations only start being advantageous when the number of non-zeros in the data is less than 10%, but the exact threshold at which switching is optimal can vary a lot by use-case. If the amount of non-zeros is less than 1% however, it is very unlikely that a regular dense data representation would be more efficient when a sparse format is supported. |
There was a problem hiding this comment.
Please fix following typos -
modifyin → modifying — line 3
environmnet → environment — line 86
sytem (in “sytem level”) → system — line 118
onMKL → oneMKL — line 122
apriori → a priori — line 159
PlumbeR → plumber — line 268
constitude → constitute — line 270
There was a problem hiding this comment.
Fixed, but:
- There's no typo in line 3.
- PlumbeR is how the authors named the library being referenced.
rsiyer-intel
left a comment
There was a problem hiding this comment.
Thanks for the changes!
Adds a guide for optimizing R workflows.