Programa de Pós-Graduação em Computação Aplicada - PPCA/NDAE/Tucuruí
URI Permanente desta comunidadehttps://repositorio.ufpa.br/handle/2011/9398
Navegar
Navegando Programa de Pós-Graduação em Computação Aplicada - PPCA/NDAE/Tucuruí por Assunto "Aprendizado de máquina"
Agora exibindo 1 - 4 de 4
- Resultados por página
- Opções de Ordenação
Item Acesso aberto (Open Access) Aplicação e comparação de técnicas de classificação automática de documentos: um estudo de caso com o dataset do domínio jurídico “Victor”(Universidade Federal do Pará, 2024-02-01) MARTINS, Victor Simões; SILVA, Cleison Daniel; http://lattes.cnpq.br/1445401605385329; https://orcid.org/0000-0001-8280-2928The application of Natural Language Processing (NLP) and Artificial Intelligence (AI) in the Brazilian legal context is a rapidly growing area that can alter the way legal professionals work, given the volume of generated text. Among the possible applications of NLP and AI is the automatic classification of documents, which, among other things, can be employed in the automation of the digitization process of Judicial Proceedings that are still in physical form. Therefore, this work applies and compares AI algorithms for the classification of legal documents. The algorithms are divided into two different approaches. The first approach (I) separates the computational representation process of the text from the classifier training itself and applies SVM and Logistic Regression in conjunction with computational representations based on TF-IDF, Word2Vec, FastText, and BERT. The second approach (II) simultaneously performs the computational representation of documents and the training of the classifier, applying Deep Learning algorithms based on recurrent neural networks, specifically ULMFiT (Universal Language Model Fine-tuning), and HAN (Hierarchical Attention Networks). The studied dataset is named VICTOR, composed of documents from the Supreme Federal Court (STF) of Brazil. The research concludes that both approaches can be applied to the classification of legal documents from the employed dataset. Additionally, despite being less computationally expensive, the classification pipelines of Approach I, which use the computational representation of the document with TF-IDF, yield results equivalent to pipelines employing Deep Learning. Furthermore, embedding documents specialization with data from the dataset under study, improves the performance of pipelines that employ Word2Vec, FastText and ULMFiT, compared to pipelines that apply the generic representations of these, i.e., models pre-trained with data from the general context.Item Acesso aberto (Open Access) Clusterização de padrões espaço-temporais de precipitação na Amazônia via deep convolutional autoencoder(Universidade Federal do Pará, 2023-07-07) SILVA, Vander Augusto Oliveira da; TEIXEIRA, Raphael Barros; http://lattes.cnpq.br/4902824086591521; https://orcid.org/0000-0003-2993-802XStudies using different machine learning methods for knowledge discovery and pattern recognition in precipitation time series are increasingly frequent in the literature. Identify and analyze patterns in precipitation time series in a particular region is fundamental for its socioeconomic development. Therefore, it can be stated that knowledge and understanding of the rainfall characteristics of the regions are important to enable the planning of the use, management and conservation of water resources. The natural phenomenon of precipitation is a fundamental process with a direct impact on watersheds and on human and environmental development. The variability of this phenomenon has important implications for the navigability of rivers, individual abundance and species richness. In recent years, many studies with this approach have been carried out in Brazil, mainly in the Amazon region. This research aimed to develop a computational method for analyzing time series of precipitation using machine learning techniques with unsupervised learning, in order to propose an method capable of extracting complex features from the data, obtaining a map of attributes at low dimensionality for pattern recognition, discovery of homogeneous regions with respect to precipitation and approximate reconstruction of precipitation time series in the Legal Amazon. The proposed deep learning neural network model is trained to learn the main and most complex features of the original data and present them in low dimensionality in latent space. After the training, the results are promising, the observations of the reconstructed data showed a good performance as evaluated by the RMSE and NRMSE metric with resulting values equal to 0.06610 and 0.3355 respectively. The analysis of the representation of the data in low dimension was applied and analyzed by a clustering structure using hierarchical agglomerative with Ward’s method. This methodology also showed good results, as it carried out consistent groupings characterizing ho- mogeneous regions in relation to precipitation data. Thus, demonstrating that the representation in low dimensionality carried the main characteristics of the time series of the analyzed data. It is noteworthy that the method developed in this study can be applied not only in the Amazon region, but also in other areas with similar challenges related to time series analysis.Item Acesso aberto (Open Access) Implementação de modelos computacionais na predição temporal e espaço-temporal de parâmetros de qualidade de água(Universidade Federal do Pará, 2021-12-14) ALMEIDA, Anderson Francisco de Sousa; MERLIN, Bruno; http://lattes.cnpq.br/7336467549495208; HTTPS://ORCID.ORG/0000-0001-7327-9960; GONZÁLEZ, Marcos Tulio Amaris; http://lattes.cnpq.br/9970287865377659The quality of water is directy related to is level of pollution caused by anthropic and industrial actions, with a consequent reduction in the availability of quality water. Therefore, limological monitorig of the basic parameters os water quality is carried out, as away of obtaining data that guide the decision-making of water resouces management bodies. In this context, the present study has the implementation of machine learning algorithms to predict temporal and spatiotemporal water quality parameter data. The ML techniques used were linear regression, ramdom forest, MLP and LSTM neural networks. Two collection points from a Water Resources Management Unit in São Paulo, Brazil were used. Models are evaluated using MAPE( mean absolute percentage eror) and RMSE( root mean squared erro) metrics. Therefore, in temporal prediction, the LSTM technique presented the best performace in relation to the other techniques and the data used, as it has the lowest average RMSE result, with 2.47. However, in spatiotemporal prediction, MLP has the best performace both in relation to the other techniques and the data used , as it has the lowest averagee results of MAPE and RMSE, respectively, 5.94% and 1.34. Thus, these performaces of neural networks can be justified by the non-linearity of the parameter data. Other than that, the results of the experiments aim to contribute to the water quality monitorng process and assist in the planning of water management, so that it meets current legislation and enales the indication of public policies, through machine learning models in prediction of water quality parametes.Item Acesso aberto (Open Access) PredictmodelGUI: ferramenta para classificação de genes essenciais através de técnicas de aprendizado de máquina(Universidade Federal do Pará, 2025-06-06) MOIA, Gislenne da Silva; SILVA, Cleison Daniel; http://lattes.cnpq.br/1445401605385329; HTTPS://ORCID.ORG/0000-0001-8280-2928; VERAS, Adonney Allan de Oliveira; http://lattes.cnpq.br/2201652617167877; https://orcid.org/0000-0002-7227-0590DNA sequencing technologies have provided significant advances in the understanding of the genetic content of numerous organisms, ranging from microorganisms to humans. Among the analyses performed in the Omics Sciences, Annotation stands out as one of the most important. Conceptually, this process consists of inferring biological information from genomic sequences, which allows researchers to understand the function of genetic products, such as Genes — the Basic Units of Heredity responsible for the physical and hereditary characteristics of an organism. Some Genes perform vital functions by encoding Proteins or RNAs essential for processes such as Cellular Metabolism, which participate in crucial pathways like Glycolysis and the Tricarboxylic Acid Cycle. Sequencing Platforms have started to generate large volumes of data, which has driven advances in the Omics fields and fostered the development of computational methods aimed at diverse analyses. More recently, Machine Learning and Artificial Intelligence techniques have been applied to these data, with studies demonstrating the effectiveness of biology-inspired approaches. These models do not require rule-based programming, although their creation still demands advanced skills in Programming and Computing. To contribute toward solving this challenge, this study presents PredictModelGUI, a graphical interface developed in Python that implements nine models to classify Essential Genes. The interface allows importing datasets, re-training models, and adjusting parameters. The information is stored in the software database, which ensures traceability and provides a simple and intuitive tool to test different configurations. Available