Skip to content

MaxwellLZH/dstools

Repository files navigation

Handy tools for data scientists

image

Available functions

metrics.calculate_relationship Determine if y is positive|negative|unrelated with x

metrics.cosine_similarity Cosine similartiy between two vector

metrics.jarccard_index Calculate the jarccard index (aka jaccard similarity).

metrics.ks_score Calculating the Kolmogorov-Smirnov score

metrics.lift_table Create lift table given cutoff point or number of bins

metrics.psi Calculate PSI given two array.

sklearn_extension.BaseEstimator Base class for all estimators in scikit-learn

sklearn_extension.Binning Base class for all Binning functionalities,

sklearn_extension.BorderlineSMOTE Over-sampling using Borderline SMOTE.

sklearn_extension.ChiSquareBinning No documentation found.

sklearn_extension.ClassifierMixin Mixin class for all classifiers in scikit-learn.

sklearn_extension.ConditionalWrapper A conditional wrapper that makes a Scikit-Learn transformer only works on part of the data

sklearn_extension.CorrelationRemover No documentation found.

sklearn_extension.EntropyBinning No documentation found.

sklearn_extension.EqualFrequencyBinning No documentation found.

sklearn_extension.EqualWidthBinning No documentation found.

sklearn_extension.IQROutlierRemover Removing outlier based on IQR,

sklearn_extension.IVBinning No documentation found.

sklearn_extension.IncrementalLogisticRegression Incremental Logistic Regression

sklearn_extension.Inspect A step that can be plugged into the pipeline to inspect the

sklearn_extension.KMeansSMOTE Apply a KMeans clustering before to over-sample using SMOTE.

sklearn_extension.KSBinning No documentation found.

sklearn_extension.NormDistOutlierRemover Removing outliers assuming data is independent and followes normal distribution

sklearn_extension.NotFittedError Exception class to raise if estimator is used before fitting.

sklearn_extension.OrdinalEncoder Similar Scikit-Learn OrdinalEncoder but allows for arbitrary ordering in the columns,

sklearn_extension.Pipeline A dropin replacement for Scikit-learn Pipeline object that supports

sklearn_extension.QuantileOutlierRemover Removing outlier based on skewness threshold

sklearn_extension.RandomOverSampler Class to perform random over-sampling.

sklearn_extension.SMOTE Class to perform over-sampling using SMOTE.

sklearn_extension.SVMSMOTE Over-sampling using SVM-SMOTE.

sklearn_extension.SparsityRemover No documentation found.

sklearn_extension.StepwiseLogisticRegression Stepwise Logistic Regression

sklearn_extension.TreeBinner No documentation found.

sklearn_extension.WoeEncoder No documentation found.

sklearn_extension.equal_frequency_binning Shortcut for equal frequency binning on a Pandas.Series, returns

sklearn_extension.equal_width_binning Shortcut for equal width binning on a Pandas.Series, returns

sklearn_extension.iv Compute the iv stats for each feature, return a list of woe value.

sklearn_extension.return_frame A class decorator for Scikit-Learn transformers

sklearn_extension.sort_columns_logistic Sort columns according to wald_chi2

sklearn_extension.sort_columns_tree Sort columns according to feature importance in tree method

sklearn_extension.woe Return a series mapping feature value to its woe stats

utils.capture_output Capture stdout and stderr as string.

utils.check_same_length A decorator that checks all the arguments to be the same length

utils.create_multilevel_index Create two-level multilevel index from given index names.

utils.find_duplicates Find duplicate elements in an iterable

utils.flatten_list Flatten a nested list regardless of the depth.

utils.get_stats Return a pstats.Stats object from a statement.

utils.groupby groupby(iterable, key=None) -> make an iterator that returns consecutive

utils.is_scalar_nan Tests if x is NaN

utils.iter_date Iterate over days

utils.limit_precision Limit the precision of a float number

utils.maybe_mkdir Create directory when it didn't exist.

utils.ngram Generating n-gram from iterable.

utils.plot_distribution Show the plot for the specified distribution

utils.print_source_code Print the source code of an object.

utils.print_stats Print out the profiling detail from the statement sorted by *keys

utils.read_csv Read multiple csv file and concatenate them row-wise

utils.read_excel Read multiple excel file and concatenate them row-wise

utils.read_multiple_files No documentation found.

utils.read_sheets Read all the sheets in an excel file and concatenate them row-wise

utils.return_default A decorator that checks the first argument, if meets the criteria then simply return the default_value

utils.set_default A decorator that checks the first argument, if meets the criteria then replace it with default_value

utils.timeit A decorator that times the function and logs the information.

utils.today Return the date of today as a string.

utils.weighted_sum No documentation found.

utils.write_dict_to_excel Save a dictionary to an Excel file with each key being the sheet name

About

Handy tools for data scientists

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages