The post How to create an Android mobile app with a deep learning AI model ? appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>1. Define the AI Model’s Objective and Gather Data:
2. Develop and Train the Deep Learning Model:
3. Optimize and Convert the Model for Mobile Deployment:
Convert your trained model to a mobile-optimized format like TensorFlow Lite (LiteRT) for efficient execution on Android devices. This often involves quantization and pruning to reduce model size and improve inference speed.
If the model is too complex for on-device processing, deploy it to a cloud service (e.g., Google Cloud AI Platform, AWS SageMaker) and access it via APIs from your Android app.
4. Develop the Android Application:
5. Integrate the AI Model into the Android App:
6. Test and Optimize:
7. Monitor and Iterate:
The post How to create an Android mobile app with a deep learning AI model ? appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Python Machine Learning Source Files appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Python Machine Learning Source Files appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Install PyTorch on Windows appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>
Before we will run the statement for the PyTorch installation in a prompt window, e.g. Anaconda prompt windows, Cuda Toolkit must be installed on your computer. The link for Cuda Toolkit download can be found in the following link : https://developer.nvidia.com/cuda-12-6-0-download-archive?target_os=Windows , or you can just google ‘install cuda‘.

Cuda Toolkit helps the usage of GPU in your computer to perform deep learning model much quicker than using CPU. Next, download the exe file for Cuda Toolkit.
However, before you can install and use Cuda on Windows, you must have Visual Studio installed first. Again, we can search ‘Visual Studio download‘ and easily find the the latest version of Visual Studio Community, which is totally free. Next, install both Visual Studio and Cuda Toolkit.

You can print the following statement to confirm Cuda Toolkit is correctly installed: nvcc –version

So after both Visual Studio and Cuda Toolkit are successfully installed on your Windows operating system (My computer just have an older version of Cuda) , we can run the statement for PyTorch installation in Anaconda prompt window.
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126
After a while, the package for PyTorch and torchvision are installed, and you can use the following command in Anaconda prompt windows to check if it is correctly installed, pytorch can be successfully imported in Python, and the Cuda Toolkit can be used in PyTorch.
(base) C:\Users\Wilso>python
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:03:56) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
Having got such message, then you can start using PyTorch in Python for deep learning now !
You can also watch video for PyTorch installation from our channel in YouTube.
The post Install PyTorch on Windows appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Topic Modeling using Latent Dirichlet Allocation with Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The specific technique applied in topic modeling is called Latent Dirichlet Allocation (LDA). LDA is a Bayesian statistical approach that tries to find groups of key words that appear most often among text examples. These most important key words represent the aspects of each topic.
In essence, LDA takes the bag-of-words matrix from the preprocessed texts as input, and decomposes it into two new matrices: A document-to-topic matrix and A word-to-topic matrix. Because the multiplication of these two matrix returns the input bag-of-words matrix, LDA tries to find topics that are able to reproduce the bag-of-words matrix, with the lowest possible error.
In the following code snippet, an application of topic model that focus on IMDB review texts was shown.
# Step 1, read csv imdb review data
# source from link 'http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'
#has been downloaded and extracted.
basepath = 'aclImdb'
labels = {'pos': 1, 'neg': 0}
pbar = pyprind.ProgBar(50000, stream=sys.stdout)
df = pd.DataFrame()
for s in ('test', 'train'):
for l in ('pos', 'neg'):
path = os.path.join(basepath, s, l)
for file in sorted(os.listdir(path)):
with open(os.path.join(path, file),
'r', encoding='utf-8') as infile:
txt = infile.read()
df = pd.concat([df, pd.DataFrame([txt, labels[l]]).transpose()],
ignore_index=True)
pbar.update()
df.columns = ['review', 'sentiment']
# Shuffling the DataFrame:
np.random.seed(0)
df = df.reindex(np.random.permutation(df.index))
# Saving the assembled data as CSV file:
df.to_csv('movie_data.csv', index=False, encoding='utf-8')
df = pd.read_csv('movie_data.csv', encoding='utf-8')
# the following is necessary on some computers:
df = df.rename(columns={"0": "review", "1": "sentiment"})
df.head(3)
#output, first three review texts and their label (note, in topic model we do not #use label information here)
view sentiment
0 In 1974, the teenager Martha Moxley (Maggie Gr... 1
1 OK... so... I really like Kris Kristofferson a... 0
2 ***SPOILER*** Do not read this, if you think a... 0
#Step 2: create bag-of-words matrix, a 50000 *5000 matrix (50000 texts, 5000 vocabulary)
count = CountVectorizer(stop_words='english',
max_df=.1,
max_features=5000)
X = count.fit_transform(df['review'].values)
#Step 3: create LDA model, and training the model with bag-of-words matrix as input.
#we set 10 topics here, and each iteration use all information from the matrix (batch)
lda = LatentDirichletAllocation(n_components=10,
random_state=123,
learning_method='batch')
X_topics = lda.fit_transform(X)
#Step 4: to print the most important 5 words for each topic
n_top_words = 5
feature_names = count.get_feature_names_out()
for topic_idx, topic in enumerate(lda.components_):
print(f'Topic {(topic_idx + 1)}:')
print(' '.join([feature_names[i]
for i in topic.argsort()\
[:-n_top_words - 1:-1]]))
#output
Topic 1:
worst minutes awful script stupid
Topic 2:
family mother father children girl
Topic 3:
american war dvd music tv
Topic 4:
human audience cinema art sense
Topic 5:
police guy car dead murder
Topic 6:
horror house sex girl woman
Topic 7:
role performance comedy actor performances
Topic 8:
series episode war episodes tv
Topic 9:
book version original read novel
Topic 10:
action fight guy guys cool
Based on reading the 5 most important words for each topic, we may use the following candidate topics for IMDB movie review texts:
Generally bad movies (not really a topic category)
Movies about families
War movies
Art movies
Crime movies
Horror movies
Comedies
Movies somehow related to TV shows
Movies based on books
Action movies
#Step 5, to confirm our guess about topics, we print out 3 review texts that have the highest probabilities associating with topic 'Horror movies'.
horror = X_topics[:, 5].argsort()[::-1]
for iter_idx, movie_idx in enumerate(horror[:3]):
print(f'\nHorror movie #{(iter_idx + 1)}:')
print(df['review'][movie_idx][:300], '...')
#output
Horror movie #1:
House of Dracula works from the same basic premise as House of Frankenstein from the year before; namely that Universal's three most famous monsters; Dracula, Frankenstein's Monster and The Wolf Man are appearing in the movie together. Naturally, the film is rather messy therefore, but the fact that ...
Horror movie #2:
Okay, what the hell kind of TRASH have I been watching now? "The Witches' Mountain" has got to be one of the most incoherent and insane Spanish exploitation flicks ever and yet, at the same time, it's also strangely compelling. There's absolutely nothing that makes sense here and I even doubt there ...
Horror movie #3:
<br /><br />Horror movie time, Japanese style. Uzumaki/Spiral was a total freakfest from start to finish. A fun freakfest at that, but at times it was a tad too reliant on kitsch rather than the horror. The story is difficult to summarize succinctly: a carefree, normal teenage girl starts coming fac ...
You can also watch video for more details of topic model application from our YouTube channel.
The post Topic Modeling using Latent Dirichlet Allocation with Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Document sentiment classification using bag-of-words in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>For online Python training registration, click here !
Sentiment classification is a type of machine learning methods, and a subfield of natural language processing (NLP). It is a kind of supervised machine learning task. With classification algorithms, such as logistic regression model, text data can be trained with respect to their labels, e.g. positive and negative.
The main procedure of a sentiment classification implementation contains the following jobs:
In the following example, we show how to perform a sentiment classification task to movie review data from IMDB.
After data is downloaded and extracted, we load it into Python working session.
basepath = 'aclImdb'
labels = {'pos': 1, 'neg': 0}
pbar = pyprind.ProgBar(50000, stream=sys.stdout)
df = pd.DataFrame()
for s in ('test', 'train'):
for l in ('pos', 'neg'):
path = os.path.join(basepath, s, l)
for file in sorted(os.listdir(path)):
with open(os.path.join(path, file),
'r', encoding='utf-8') as infile:
txt = infile.read()
df = pd.concat([df, pd.DataFrame([txt, labels[l]]).transpose()],
ignore_index=True)
pbar.update()
Then we can show the data contents after rows of data frame is shuffled.
df.columns = ['review', 'sentiment']
np.random.seed(0)
df = df.reindex(np.random.permutation(df.index))
df.head(3)
df.shape
Following code is for tfidf creation, and preprocessing raw text data, where we removed html tags and keep emoticons in the text. Several functions associated with tokenizer and stop words are created also.
tfidf = TfidfTransformer(use_idf=True, norm=None, smooth_idf=True)
def preprocessor(text):
text = re.sub('<[^>]*>', '', text)
emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)',
text)
text = (re.sub('[\W]+', ' ', text.lower()) +
' '.join(emoticons).replace('-', ''))
return text
df['review'] = df['review'].apply(preprocessor)
porter = PorterStemmer()
def tokenizer(text):
return text.split()
def tokenizer_porter(text):
return [porter.stem(word) for word in text.split()]
nltk.download('stopwords')
stop = stopwords.words('english')
Next we can train the data, using logistic regression. The best performance in terms of hyperparameter combination search is carried out with grid search.
X_train = df.loc[:25000, 'review'].values
y_train = df.loc[:25000, 'sentiment'].values
X_test = df.loc[25000:, 'review'].values
y_test = df.loc[25000:, 'sentiment'].values
tfidf = TfidfVectorizer(strip_accents=None,
lowercase=False,
preprocessor=None)
small_param_grid = [{'vect__ngram_range': [(1, 1)],
'vect__stop_words': [None],
'vect__tokenizer': [tokenizer, tokenizer_porter],
'clf__penalty': ['l2'],
'clf__C': [1.0, 10.0]},
{'vect__ngram_range': [(1, 1)],
'vect__stop_words': [stop, None],
'vect__tokenizer': [tokenizer],
'vect__use_idf':[False],
'vect__norm':[None],
'clf__penalty': ['l2'],
'clf__C': [1.0, 10.0]},
]
lr_tfidf = Pipeline([('vect', tfidf),
('clf', LogisticRegression(solver='liblinear'))])
gs_lr_tfidf = GridSearchCV(lr_tfidf, small_param_grid,
scoring='accuracy',
cv=5,
verbose=1,
n_jobs=-1)
gs_lr_tfidf.fit(X_train, y_train)
After the training process is finished, we can print out the hyperparameters associated with the best model, as well as the accuracy of the best model on both training data and test data.
print(f'Best parameter set: {gs_lr_tfidf.best_params_}')
print(f'CV Accuracy: {gs_lr_tfidf.best_score_:.3f}')
clf = gs_lr_tfidf.best_estimator_
print(f'Test Accuracy: {clf.score(X_test, y_test):.3f}')
#output
Best parameter set: {'clf__C': 10.0, 'clf__penalty': 'l2', 'vect__ngram_range': (1, 1), 'vect__stop_words': None, 'vect__tokenizer': <function tokenizer at 0x000001EB20C0B380>}
{gs_lr_tfidf.best_score_:.3f}')
CV Accuracy: 0.897
{clf.score(X_test, y_test):.3f}')
Test Accuracy: 0.899
If you want to take a look at more details of the code in python source file, you can click the following link to download the file ch08.py.
You can also watch the video for this application on our YouTube channel.
The post Document sentiment classification using bag-of-words in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Download Python Course source files appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Download Python Course source files appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post How to create a data frame from nested dictionary with Pandas in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>Pandas provides flexible ways of generating data frames. One of them is by inputting in pd.DataFrame() function. For example, ND1 is a nested dictionary.
ND1= {'age': { 'VB1': 22, 'VB2': 33,'VB3':19},
'name': { 'VB1': 'wilson', 'VB2': 'shirley', 'VB3': 'mico'},
'city': { 'VB1': 'molde', 'VB2': 'molde', 'VB3': 'aukra'}}
When this dictionary is passed directly as an argument to the function DataFrame(), it will be treated by Pandas that external keys of the nested dictionary as column names of the new data frame, and internal keys as labels for the indexes. If there are any unmatched fields or inconsistency exist during this process of interpretation, Pandas will add NaN value to those missing places.
#Import Pandas module
import pandas as pd
DF1 = pd.DataFrame(ND1)
DF1
#Output
age name city
VB1 22 wilson molde
VB2 33 shirley molde
VB3 19 mico aukra
In the example above, we can see that keys ‘age’, ‘name’, ‘city’ act as column labels, and keys ‘VB1’, ‘VB2’, ‘VB3’ appear as index labels in the new data frame.
The post How to create a data frame from nested dictionary with Pandas in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post How to delete columns of a data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>Data frame is the tabular data object in Python. It can store different mode of data for different columns. If you want to remove unwanted columns from a data frame, you can use either del() function or drop() method. Next we show some examples about that.
#Import Pandas module
import pandas as pd
#create a dictionary
Dict4 = {'last' : ['zhang', 'yue', 'lin', 'li', 'wang'],
'first' : ['wei', 'shirley', 'mico', 'miaomiao', 'maomao'],
'age' : [32, 34, 8, 14, 3],
'city': ['molde','aukra','molde','aukra','molde']}
#create a data frame with inputting dictionary above
Df4 = pd.DataFrame(Dict4)
Df4
#output
last first age city
0 zhang wei 32 molde
1 yue shirley 34 aukra
2 lin mico 8 molde
3 li miaomiao 14 aukra
4 wang maomao 3 molde
#delete one column, using del() function
del Df4['first']
Df4
#output
last age city
0 zhang 32 molde
1 yue 34 aukra
2 lin 8 molde
3 li 14 aukra
4 wang 3 molde
#create again same data frame
Df4 = pd.DataFrame(Dict4)
#using drop method to remove two columns
Df4= Df4.drop(['city','age'], axis=1)
Df4
#output
last first
0 zhang wei
1 yue shirley
2 lin mico
3 li miaomiao
4 wang maomao
Sometimes you may need the removed column, then you can use pop() method to data frame.
#create again data frame
Df4 = pd.DataFrame(Dict4)
#remove column 'city' and save this to an object.
Pop_col= Df4.pop('city')
#show popped columm, it is a series
Pop_col
#output
0 molde
1 aukra
2 molde
3 aukra
4 molde
Name: city, dtype: object
#show original data frame, the column 'city' has been removed
Df4
#output
last first age
0 zhang wei 32
1 yue shirley 34
2 lin mico 8
3 li miaomiao 14
4 wang maomao 3
For more examples on Python, you can view playlists from our YouTube channel.
The post How to delete columns of a data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post Using isin() to check membership of a data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>When a data frame in Python is created via Pandas library, its membership can be checked using function isin(). It is quite similar as with the same function carried out for a Pandas Series, however, now the returned data object is a data frame too.
For example we create a data frame storing information for name and age and city. Then we can check if certain values of the data frame matches the specified values in a list using isin() and filter out those values.
#Import Pandas module
import pandas as pd
#create a dictionary as input for data frame creation
DICT2 = {'age': [32, 31, 19],
'name': ["shirley","wilson","mico"],
'city': ["London","Tokyo","Shanghai"]}
#creating data frame
DF2 = pd.DataFrame(DICT2)
DF2
#output
age name city
0 32 shirley London
1 31 wilson Tokyo
2 19 mico Shanghai
#check if values of data frame matches the values in the list
DF2.isin([19, 'London'])
#output
age name city
0 False False True
1 False False False
2 True False False
#we can further filter out values that matches the values in the list
DF2[DF2.isin([19, 'London'])]
#result is a data frame too, with NaN values for other non-matched values
age name city
0 NaN NaN London
1 NaN NaN NaN
2 19.0 NaN NaN
The post Using isin() to check membership of a data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>The post How to assign values to Pandas data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>A data frame in Python is the data object that stores tabular information. It is provided by Pandas library. Once a data frame is generated, its value can be assigned or updated. For example, we can first set new column and row index labels, which is shown in the following code example.
#Import Pandas data frame
import pandas as pd
DF1 = pd.DataFrame(np.arange(12).reshape((4,3)),
index=['row1', 'row2', 'row3', 'row4'],
columns=['wilson', 'shirley', 'dudu'])
DF1
#output
wilson shirley dudu
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11
#Set column and row index names
DF1.index.name = 'Rows';
DF1.columns.name = 'Members'
DF1
#output
Members wilson shirley dudu
Rows
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11
If you want to add a new column to the existent data frame, you can just add column name inside the brackets after the data frame, then put the values, either a single value or a list or Series on the right side of the assignment symbol. Next examples show these operations.
#Assign a new column with same value for each row of this column
DF1['maomao'] = 8
DF1
#output
Members wilson shirley dudu maomao
Rows
row1 0 1 2 8
row2 3 4 5 8
row3 6 7 8 8
row4 9 10 11 8
#Or we can add a new column with values from a list
DF1['new1'] = [9,10,11,12]
DF1
#output
Members wilson shirley dudu maomao new1
Rows
row1 0 1 2 8 9
row2 3 4 5 8 10
row3 6 7 8 8 11
row4 9 10 11 8 12
#We can also create a Series, then assign it to the new column of data frame
S1 = pd.Series(np.arange(4))
S1
#output
0 0
1 1
2 2
3 3
dtype: int32
DF1['new2'] = S1
DF1
#output
Members wilson shirley dudu maomao new1 new2
Rows
row1 0 1 2 8 9 NaN
row2 3 4 5 8 10 NaN
row3 6 7 8 8 11 NaN
row4 9 10 11 8 12 NaN
The post How to assign values to Pandas data frame in Python appeared first on We provide R, Python, Statistics Online-Learning Course.
]]>