python Archives - We provide R, Python, Statistics Online-Learning Course

How to create an Android mobile app with a deep learning AI model ?

wilsonzhang746 — Mon, 08 Dec 2025 10:34:29 +0000

Creating an Android mobile app with a deep learning AI model involves several key steps:

1. Define the AI Model’s Objective and Gather Data:

Clearly define the problem the deep learning model will solve within your app (e.g., image classification, object detection, natural language processing).
Collect and prepare a high-quality dataset relevant to your objective, ensuring it is properly labeled and representative.

2. Develop and Train the Deep Learning Model:

Choose a suitable deep learning framework like TensorFlow or PyTorch.
Design and train your model using the prepared dataset, optimizing for accuracy and efficiency.
Consider using pre-trained models and fine-tuning them for your specific task if applicable.

3. Optimize and Convert the Model for Mobile Deployment:

For on-device inference:

Convert your trained model to a mobile-optimized format like TensorFlow Lite (LiteRT) for efficient execution on Android devices. This often involves quantization and pruning to reduce model size and improve inference speed.

For cloud-based inference:

If the model is too complex for on-device processing, deploy it to a cloud service (e.g., Google Cloud AI Platform, AWS SageMaker) and access it via APIs from your Android app.

4. Develop the Android Application:

Use Android Studio and your preferred language (Java or Kotlin) to build the user interface and core functionalities of your app.
Integrate the necessary libraries and dependencies for interacting with your AI model.

5. Integrate the AI Model into the Android App:

For on-device models (e.g., TensorFlow Lite):
Add the LiteRT library to your Android project.
Load your optimized .tflite model into the app.
Implement the code to pass input data to the model and process the output.
For cloud-based models:
Use an API client library to send requests to your deployed model in the cloud.
Handle the API responses and integrate the model’s predictions into your app’s logic.
Consider using services like Firebase ML Kit for common tasks like text recognition or image labeling, which offer simplified API integration.

6. Test and Optimize:

Thoroughly test your app and the integrated AI model on various Android devices and scenarios to ensure performance, accuracy, and stability.
Optimize the model and app for mobile performance, considering factors like battery life and resource consumption.

7. Monitor and Iterate:

After deployment, continuously monitor the model’s performance and gather user feedback.
Iterate on both the model and the app to improve features and address any issues.

The post How to create an Android mobile app with a deep learning AI model ? appeared first on We provide R, Python, Statistics Online-Learning Course.

Python Machine Learning Source Files

wilsonzhang746 — Wed, 08 Oct 2025 17:15:17 +0000

Click here to download Python Machine Learning Source Files !

The post Python Machine Learning Source Files appeared first on We provide R, Python, Statistics Online-Learning Course.

Install PyTorch on Windows

wilsonzhang746 — Sun, 28 Sep 2025 13:12:17 +0000

PyTorch is a deep learning package for machine learning, or deep learning in particular for a neural network model. To install PyTorch onto your Windows operating system, first goes to the official website for PyTorch: www.pytorch.org, and the information associated with Windows and Cuda version should be highlighted on this page.

PyTorch Installation information

Before we will run the statement for the PyTorch installation in a prompt window, e.g. Anaconda prompt windows, Cuda Toolkit must be installed on your computer. The link for Cuda Toolkit download can be found in the following link : https://developer.nvidia.com/cuda-12-6-0-download-archive?target_os=Windows , or you can just google ‘install cuda‘.

Cuda Toolkit Installation

Cuda Toolkit helps the usage of GPU in your computer to perform deep learning model much quicker than using CPU. Next, download the exe file for Cuda Toolkit.

However, before you can install and use Cuda on Windows, you must have Visual Studio installed first. Again, we can search ‘Visual Studio download‘ and easily find the the latest version of Visual Studio Community, which is totally free. Next, install both Visual Studio and Cuda Toolkit.

Visual Studio Community Download

You can print the following statement to confirm Cuda Toolkit is correctly installed: nvcc –version

So after both Visual Studio and Cuda Toolkit are successfully installed on your Windows operating system (My computer just have an older version of Cuda) , we can run the statement for PyTorch installation in Anaconda prompt window.

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126

After a while, the package for PyTorch and torchvision are installed, and you can use the following command in Anaconda prompt windows to check if it is correctly installed, pytorch can be successfully imported in Python, and the Cuda Toolkit can be used in PyTorch.

(base) C:\Users\Wilso>python
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:03:56) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>

Having got such message, then you can start using PyTorch in Python for deep learning now !

You can also watch video for PyTorch installation from our channel in YouTube.

The post Install PyTorch on Windows appeared first on We provide R, Python, Statistics Online-Learning Course.

Topic Modeling using Latent Dirichlet Allocation with Python

wilsonzhang746 — Sun, 07 Sep 2025 14:07:09 +0000

Topic modeling is a subcategory of unsupervised machine learning method, and a clustering task in particular. The main purpose of a topic model is assigning topics to unlabeled text documents., for example, a typical application is the categorization of social media blog into categories, such as sports, finance, world news, politics, and local news.

The specific technique applied in topic modeling is called Latent Dirichlet Allocation (LDA). LDA is a Bayesian statistical approach that tries to find groups of key words that appear most often among text examples. These most important key words represent the aspects of each topic.

In essence, LDA takes the bag-of-words matrix from the preprocessed texts as input, and decomposes it into two new matrices: A document-to-topic matrix and A word-to-topic matrix. Because the multiplication of these two matrix returns the input bag-of-words matrix, LDA tries to find topics that are able to reproduce the bag-of-words matrix, with the lowest possible error.

In the following code snippet, an application of topic model that focus on IMDB review texts was shown.

# Step 1, read csv imdb review data 
# source from link 'http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'
#has been downloaded and extracted. 
basepath = 'aclImdb'

labels = {'pos': 1, 'neg': 0}
pbar = pyprind.ProgBar(50000, stream=sys.stdout)
df = pd.DataFrame()


for s in ('test', 'train'):
    for l in ('pos', 'neg'):
        path = os.path.join(basepath, s, l)
        for file in sorted(os.listdir(path)):
            with open(os.path.join(path, file), 
                      'r', encoding='utf-8') as infile:
                txt = infile.read()
            df = pd.concat([df, pd.DataFrame([txt, labels[l]]).transpose()], 
                           ignore_index=True)
            pbar.update()
        
df.columns = ['review', 'sentiment']
# Shuffling the DataFrame:
np.random.seed(0)
df = df.reindex(np.random.permutation(df.index))

# Saving the assembled data as CSV file:
df.to_csv('movie_data.csv', index=False, encoding='utf-8')

df = pd.read_csv('movie_data.csv', encoding='utf-8')

# the following is necessary on some computers:
df = df.rename(columns={"0": "review", "1": "sentiment"})

df.head(3)
#output, first three review texts and their label (note, in topic model we do not #use label information here)
view  sentiment
0  In 1974, the teenager Martha Moxley (Maggie Gr...          1
1  OK... so... I really like Kris Kristofferson a...          0
2  ***SPOILER*** Do not read this, if you think a...          0

#Step 2: create bag-of-words matrix, a 50000 *5000 matrix (50000 texts, 5000 vocabulary)

count = CountVectorizer(stop_words='english',
         max_df=.1,
         max_features=5000)
X = count.fit_transform(df['review'].values)

#Step 3: create LDA model, and training the model with bag-of-words matrix as input.

#we set 10 topics here, and each iteration use all information from the matrix (batch)

lda = LatentDirichletAllocation(n_components=10,
          random_state=123,
          learning_method='batch')
X_topics = lda.fit_transform(X)

#Step 4:  to print the most important 5 words for each topic

n_top_words = 5
feature_names = count.get_feature_names_out()

for topic_idx, topic in enumerate(lda.components_):
    print(f'Topic {(topic_idx + 1)}:')
    print(' '.join([feature_names[i]
              for i in topic.argsort()\
                     [:-n_top_words - 1:-1]]))

#output

Topic 1:
worst minutes awful script stupid
Topic 2:
family mother father children girl
Topic 3:
american war dvd music tv
Topic 4:
human audience cinema art sense
Topic 5:
police guy car dead murder
Topic 6:
horror house sex girl woman
Topic 7:
role performance comedy actor performances
Topic 8:
series episode war episodes tv
Topic 9:
book version original read novel
Topic 10:
action fight guy guys cool

Based on reading the 5 most important words for each topic, we may use the following candidate topics for IMDB movie review texts:

Generally bad movies (not really a topic category)

Movies about families

War movies

Art movies

Crime movies

Horror movies

Comedies

Movies somehow related to TV shows

Movies based on books

Action movies

#Step 5, to confirm our guess about topics, we print out 3 review texts that have the highest probabilities associating with topic 'Horror movies'.

horror = X_topics[:, 5].argsort()[::-1]

for iter_idx, movie_idx in enumerate(horror[:3]):
    print(f'\nHorror movie #{(iter_idx + 1)}:')
    print(df['review'][movie_idx][:300], '...')

#output

Horror movie #1:
House of Dracula works from the same basic premise as House of Frankenstein from the year before; namely that Universal's three most famous monsters; Dracula, Frankenstein's Monster and The Wolf Man are appearing in the movie together. Naturally, the film is rather messy therefore, but the fact that ...

Horror movie #2:
Okay, what the hell kind of TRASH have I been watching now? "The Witches' Mountain" has got to be one of the most incoherent and insane Spanish exploitation flicks ever and yet, at the same time, it's also strangely compelling. There's absolutely nothing that makes sense here and I even doubt there  ...

Horror movie #3:


Horror movie time, Japanese style. Uzumaki/Spiral was a total freakfest from start to finish. A fun freakfest at that, but at times it was a tad too reliant on kitsch rather than the horror. The story is difficult to summarize succinctly: a carefree, normal teenage girl starts coming fac ...

You can also watch video for more details of topic model application from our YouTube channel.

The post Topic Modeling using Latent Dirichlet Allocation with Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Document sentiment classification using bag-of-words in Python

wilsonzhang746 — Sat, 30 Aug 2025 13:19:29 +0000

For online Python training registration, click here !

Sentiment classification is a type of machine learning methods, and a subfield of natural language processing (NLP). It is a kind of supervised machine learning task. With classification algorithms, such as logistic regression model, text data can be trained with respect to their labels, e.g. positive and negative.

The main procedure of a sentiment classification implementation contains the following jobs:

Raw text data are cleaned and preprocessed, in which the unwanted tags are removed.
Text are tokenized into word or token and a tfidf matrix is created.
Machine learning model applied on the tokenized matrix.

In the following example, we show how to perform a sentiment classification task to movie review data from IMDB.

After data is downloaded and extracted, we load it into Python working session.

basepath = 'aclImdb'

labels = {'pos': 1, 'neg': 0}
pbar = pyprind.ProgBar(50000, stream=sys.stdout)
df = pd.DataFrame()

for s in ('test', 'train'):
    for l in ('pos', 'neg'):
        path = os.path.join(basepath, s, l)
        for file in sorted(os.listdir(path)):
            with open(os.path.join(path, file), 
                      'r', encoding='utf-8') as infile:
                txt = infile.read()
            df = pd.concat([df, pd.DataFrame([txt, labels[l]]).transpose()], 
                           ignore_index=True)
            pbar.update()

Then we can show the data contents after rows of data frame is shuffled.

df.columns = ['review', 'sentiment']
np.random.seed(0)
df = df.reindex(np.random.permutation(df.index))
df.head(3)
df.shape

Following code is for tfidf creation, and preprocessing raw text data, where we removed html tags and keep emoticons in the text. Several functions associated with tokenizer and stop words are created also.


tfidf = TfidfTransformer(use_idf=True, norm=None, smooth_idf=True)
def preprocessor(text):
    text = re.sub('<[^>]*>', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)',
                           text)
    text = (re.sub('[\W]+', ' ', text.lower()) +
            ' '.join(emoticons).replace('-', ''))
    return text
df['review'] = df['review'].apply(preprocessor)
porter = PorterStemmer()

def tokenizer(text):
    return text.split()
def tokenizer_porter(text):
    return [porter.stem(word) for word in text.split()]
nltk.download('stopwords')
stop = stopwords.words('english')

Next we can train the data, using logistic regression. The best performance in terms of hyperparameter combination search is carried out with grid search.

X_train = df.loc[:25000, 'review'].values
y_train = df.loc[:25000, 'sentiment'].values
X_test = df.loc[25000:, 'review'].values
y_test = df.loc[25000:, 'sentiment'].values
tfidf = TfidfVectorizer(strip_accents=None,
                        lowercase=False,
                        preprocessor=None)
small_param_grid = [{'vect__ngram_range': [(1, 1)],
                     'vect__stop_words': [None],
                     'vect__tokenizer': [tokenizer, tokenizer_porter],
                     'clf__penalty': ['l2'],
                     'clf__C': [1.0, 10.0]},
                    {'vect__ngram_range': [(1, 1)],
                     'vect__stop_words': [stop, None],
                     'vect__tokenizer': [tokenizer],
                     'vect__use_idf':[False],
                     'vect__norm':[None],
                     'clf__penalty': ['l2'],
                  'clf__C': [1.0, 10.0]},
              ]

lr_tfidf = Pipeline([('vect', tfidf),
                     ('clf', LogisticRegression(solver='liblinear'))])
gs_lr_tfidf = GridSearchCV(lr_tfidf, small_param_grid,
                           scoring='accuracy',
                           cv=5,
                           verbose=1,
                           n_jobs=-1)
gs_lr_tfidf.fit(X_train, y_train)

After the training process is finished, we can print out the hyperparameters associated with the best model, as well as the accuracy of the best model on both training data and test data.

print(f'Best parameter set: {gs_lr_tfidf.best_params_}')
print(f'CV Accuracy: {gs_lr_tfidf.best_score_:.3f}')
clf = gs_lr_tfidf.best_estimator_
print(f'Test Accuracy: {clf.score(X_test, y_test):.3f}')
#output
Best parameter set: {'clf__C': 10.0, 'clf__penalty': 'l2', 'vect__ngram_range': (1, 1), 'vect__stop_words': None, 'vect__tokenizer': }
{gs_lr_tfidf.best_score_:.3f}')
CV Accuracy: 0.897
{clf.score(X_test, y_test):.3f}')
Test Accuracy: 0.899

If you want to take a look at more details of the code in python source file, you can click the following link to download the file ch08.py.

You can also watch the video for this application on our YouTube channel.

The post Document sentiment classification using bag-of-words in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Download Python Course source files

wilsonzhang746 — Sat, 07 Dec 2024 11:34:46 +0000

Click here to download Python Course Source Files !

The post Download Python Course source files appeared first on We provide R, Python, Statistics Online-Learning Course.

How to create a data frame from nested dictionary with Pandas in Python

wilsonzhang746 — Tue, 03 Sep 2024 11:21:15 +0000

For online Python training registration, click here !

Pandas provides flexible ways of generating data frames. One of them is by inputting in pd.DataFrame() function. For example, ND1 is a nested dictionary.

ND1= {'age': { 'VB1': 22, 'VB2': 33,'VB3':19},
            'name': { 'VB1': 'wilson', 'VB2': 'shirley', 'VB3': 'mico'},
            'city': { 'VB1': 'molde', 'VB2': 'molde', 'VB3': 'aukra'}}

When this dictionary is passed directly as an argument to the function DataFrame(), it will be treated by Pandas that external keys of the nested dictionary as column names of the new data frame, and internal keys as labels for the indexes. If there are any unmatched fields or inconsistency exist during this process of interpretation, Pandas will add NaN value to those missing places.

#Import Pandas module
import pandas as pd
DF1 = pd.DataFrame(ND1)
DF1
#Output
     age     name   city
VB1   22   wilson  molde
VB2   33  shirley  molde
VB3   19     mico  aukra

In the example above, we can see that keys ‘age’, ‘name’, ‘city’ act as column labels, and keys ‘VB1’, ‘VB2’, ‘VB3’ appear as index labels in the new data frame.

The post How to create a data frame from nested dictionary with Pandas in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

How to delete columns of a data frame in Python

wilsonzhang746 — Mon, 26 Aug 2024 19:23:07 +0000

For online Python training registration, click here !

Data frame is the tabular data object in Python. It can store different mode of data for different columns. If you want to remove unwanted columns from a data frame, you can use either del() function or drop() method. Next we show some examples about that.

#Import Pandas module
import pandas as pd
#create a dictionary
Dict4 = {'last' : ['zhang', 'yue', 'lin', 'li', 'wang'],
        'first' : ['wei', 'shirley', 'mico', 'miaomiao', 'maomao'],
        'age' : [32, 34, 8, 14, 3],
        'city': ['molde','aukra','molde','aukra','molde']}
#create a data frame with inputting dictionary above
Df4 = pd.DataFrame(Dict4)
Df4
#output
    last     first  age   city
0  zhang       wei   32  molde
1    yue   shirley   34  aukra
2    lin      mico    8  molde
3     li  miaomiao   14  aukra
4   wang    maomao    3  molde
#delete one column, using del() function
del Df4['first']
Df4
#output
    last  age   city
0  zhang   32  molde
1    yue   34  aukra
2    lin    8  molde
3     li   14  aukra
4   wang    3  molde
#create again same data frame 
Df4 = pd.DataFrame(Dict4)
#using drop method to remove two columns
Df4= Df4.drop(['city','age'], axis=1)
Df4
#output
    last     first
0  zhang       wei
1    yue   shirley
2    lin      mico
3     li  miaomiao
4   wang    maomao

Sometimes you may need the removed column, then you can use pop() method to data frame.

#create again data frame
Df4 = pd.DataFrame(Dict4)
#remove column 'city' and save this to an object.
Pop_col= Df4.pop('city')
#show popped columm, it is a series
Pop_col
#output
0    molde
1    aukra
2    molde
3    aukra
4    molde
Name: city, dtype: object
#show original data frame, the column 'city' has been removed
Df4
#output
    last     first  age
0  zhang       wei   32
1    yue   shirley   34
2    lin      mico    8
3     li  miaomiao   14
4   wang    maomao    3

For more examples on Python, you can view playlists from our YouTube channel.

The post How to delete columns of a data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Using isin() to check membership of a data frame in Python

wilsonzhang746 — Sun, 25 Aug 2024 11:59:17 +0000

Click her for course registration !

When a data frame in Python is created via Pandas library, its membership can be checked using function isin(). It is quite similar as with the same function carried out for a Pandas Series, however, now the returned data object is a data frame too.

For example we create a data frame storing information for name and age and city. Then we can check if certain values of the data frame matches the specified values in a list using isin() and filter out those values.

#Import Pandas module
import pandas as pd
#create a dictionary as input for data frame creation
DICT2 = {'age': [32, 31, 19],
            'name': ["shirley","wilson","mico"],
            'city': ["London","Tokyo","Shanghai"]}
#creating data frame
DF2 = pd.DataFrame(DICT2)
DF2
#output
   age     name      city
0   32  shirley    London
1   31   wilson     Tokyo
2   19     mico  Shanghai
#check if values of data frame matches the values in the list
DF2.isin([19, 'London'])
#output
     age   name   city
0  False  False   True
1  False  False  False
2   True  False  False
#we can further filter out values that matches the values in the list
DF2[DF2.isin([19, 'London'])]
#result is a data frame too, with NaN values for other non-matched values
    age name    city
0   NaN  NaN  London
1   NaN  NaN     NaN
2  19.0  NaN     NaN

The post Using isin() to check membership of a data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

How to assign values to Pandas data frame in Python

wilsonzhang746 — Sat, 24 Aug 2024 12:16:29 +0000

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental level, click here for more details.

A data frame in Python is the data object that stores tabular information. It is provided by Pandas library. Once a data frame is generated, its value can be assigned or updated. For example, we can first set new column and row index labels, which is shown in the following code example.

#Import Pandas data frame
import pandas as pd
DF1 = pd.DataFrame(np.arange(12).reshape((4,3)),
                     index=['row1', 'row2', 'row3', 'row4'],
                     columns=['wilson', 'shirley', 'dudu'])
DF1
#output
      wilson  shirley  dudu
row1       0        1     2
row2       3        4     5
row3       6        7     8
row4       9       10    11
#Set column and row index names
DF1.index.name = 'Rows'; 
DF1.columns.name = 'Members'
DF1
#output
Members  wilson  shirley  dudu
Rows                          
row1          0        1     2
row2          3        4     5
row3          6        7     8
row4          9       10    11

If you want to add a new column to the existent data frame, you can just add column name inside the brackets after the data frame, then put the values, either a single value or a list or Series on the right side of the assignment symbol. Next examples show these operations.

#Assign a new column with same value for each row of this column
DF1['maomao'] = 8
DF1
#output
Members  wilson  shirley  dudu  maomao
Rows                                  
row1          0        1     2       8
row2          3        4     5       8
row3          6        7     8       8
row4          9       10    11       8
#Or we can add a new column with values from a list
DF1['new1'] = [9,10,11,12]
DF1
#output
Members  wilson  shirley  dudu  maomao  new1
Rows                                        
row1          0        1     2       8     9
row2          3        4     5       8    10
row3          6        7     8       8    11
row4          9       10    11       8    12
#We can also create a Series, then assign it to the new column of data frame
S1 = pd.Series(np.arange(4))
S1
#output
0    0
1    1
2    2
3    3
dtype: int32
DF1['new2'] = S1
DF1
#output
Members  wilson  shirley  dudu  maomao  new1  new2
Rows                                              
row1          0        1     2       8     9   NaN
row2          3        4     5       8    10   NaN
row3          6        7     8       8    11   NaN
row4          9       10    11       8    12   NaN

You can also watch videos on our YouTube channel for more understanding of Python programming skills.

The post How to assign values to Pandas data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.