medstat package¶

Submodules¶

medstat.medstat module¶

Main module.

medstat.medstat.analyse_dataset(data: pandas.core.frame.DataFrame, hypothesis: List[tuple], threshold: float = 0.05, file=None)[source]¶

Provide a data set and a list of couple of factor for which you want to check the independence and it will perform the appropriate test for each hypothesis. The results will also be printed on the screen and can be saved to a file.

Args:

data: The data set.

hypothesis: List of 2-tuples containing the factor to tests

threshold (optional): p-value threshold under which the test result is considered significant

file (optional): A file where to write a report of the results

Returns:

List: A list of dictionnary containing for each test the result, the contingency table etc (see test_hypothesis output)

Examples:

>>> medstat.analyse_dataset(data,[('sex', 'age < 30'),
                                  ('sex', 'test_a')],
                            file='report.txt')
[{'contengency_table':
    age < 30  False  True  All
    sex
    Female       21    18   39
    Male         29    12   41
    All          50    30   80,
    'test': 'Chi-squared',
    'p-value': 0.18407215636751517,
    'significant': False},
    {'contengency_table':
    test_a  negative  positive  All
    sex
    Female        25        14   39
    Male          25        16   41
    All           50        30   80,
    'test': 'Chi-squared',
    'p-value': 0.9539453144224308,
    'significant': False}]

medstat.medstat.test_hypothesis(data: pandas.core.frame.DataFrame, expression_1: str, expression_2: str, threshold: float = 0.05)[source]¶

Perform an hypothesis test of independence between expression_1 and expression_2. The expression can be column names, in that case each category of the column is considered, or boolean tests.

Depending on the frequencies a fisher test or a chi square test will be performed.

Args:

data (pd.DataFrame): The data frame containing the data under study

expression_1 (str): A column name or a boolean test

expression_2 (str): A column name or a boolean test

threshold (float): p-value threshold under which the test is considered significant

Returns:

Dict: Containing the p-value, the contengency table, the test used and if the result is significant

Examples:

>>> medstat.test_hypothesis(data, 'sex', 'age < 30')
{'contingency_table':
    age < 30  False  True  All
    sex
    Female       26    22   48
    Male         24     8   32
    All          50    30   80,
 'test': 'Fisher',
 'p-value': 0.06541995357625573,
 'significant': False}

Module contents¶

Top-level package for medstat.

medstat.test_hypothesis(data: pandas.core.frame.DataFrame, expression_1: str, expression_2: str, threshold: float = 0.05)[source]¶

Perform an hypothesis test of independence between expression_1 and expression_2. The expression can be column names, in that case each category of the column is considered, or boolean tests.

Depending on the frequencies a fisher test or a chi square test will be performed.

Args:

data (pd.DataFrame): The data frame containing the data under study

expression_1 (str): A column name or a boolean test

expression_2 (str): A column name or a boolean test

threshold (float): p-value threshold under which the test is considered significant

Returns:

Dict: Containing the p-value, the contengency table, the test used and if the result is significant

Examples:

>>> medstat.test_hypothesis(data, 'sex', 'age < 30')
{'contingency_table':
    age < 30  False  True  All
    sex
    Female       26    22   48
    Male         24     8   32
    All          50    30   80,
 'test': 'Fisher',
 'p-value': 0.06541995357625573,
 'significant': False}

medstat.analyse_dataset(data: pandas.core.frame.DataFrame, hypothesis: List[tuple], threshold: float = 0.05, file=None)[source]¶

Provide a data set and a list of couple of factor for which you want to check the independence and it will perform the appropriate test for each hypothesis. The results will also be printed on the screen and can be saved to a file.

Args:

data: The data set.

hypothesis: List of 2-tuples containing the factor to tests

threshold (optional): p-value threshold under which the test result is considered significant

file (optional): A file where to write a report of the results

Returns:

List: A list of dictionnary containing for each test the result, the contingency table etc (see test_hypothesis output)

Examples:

>>> medstat.analyse_dataset(data,[('sex', 'age < 30'),
                                  ('sex', 'test_a')],
                            file='report.txt')
[{'contengency_table':
    age < 30  False  True  All
    sex
    Female       21    18   39
    Male         29    12   41
    All          50    30   80,
    'test': 'Chi-squared',
    'p-value': 0.18407215636751517,
    'significant': False},
    {'contengency_table':
    test_a  negative  positive  All
    sex
    Female        25        14   39
    Male          25        16   41
    All           50        30   80,
    'test': 'Chi-squared',
    'p-value': 0.9539453144224308,
    'significant': False}]